Hi Gerri,
This is more of a small feature request than an actual question. I plan to use PROOF’s dataset mechanism on our Tier3. This works more or less already using the PQ2 tools, there’s just one annoyance.
It seems that TProof::VerifyDataSet(…) called by pq2-verify picks the “main” TTree from the dataset files “randomly”. Well, it’s definitely not picking them at random, but it’s always picking the “wrong” TTree for our files. We usually have a number of TTrees in our ntuples, as we save some metadata in TTree format as well. Of course these metadata trees usually only have 1-2 entries per file.
So what I end up with is a database configuration that has let’s say 600 files in a dataset, and claims that I have ~700 events in these files. Of course I have more like 7M events in them.
It’s not a big issue, as PROOF still happily runs on these datasets as long as I specify to TProof::Process(…) which TTree I want to use. It’s just a data management issue. It would be easier to look at the locally available datasets if there was a simple way of storing the actual number of events in them.
From a user perspective I’d suggest using the same formalism as with TProof::Process(…). So if a user wants to use a specific TTree for the event count, (s)he should ask for the verification of the dataset with a name like this: “/default/me/myDataSet#MyTree”.
Cheers,
Attila