Is there a way to read only a fraction x of the events in a TTree to decrease run time?
I would like to avoid reading only the first x*N events, because the tree might have been written according to some order. It would be nice if I could read a tree in a way such that each event is read with probability x, maybe starting out from some seed in order to ensure reproducibility.
Ideally, this would be applicable to Draw(), too.
Thanks a lot for your comments!
See the 2 last parameters of TTree::Draw():
root.cern.ch/root/htmldoc/TTree … ree:Draw@1
Sure, that’s an option. However, I then have to call Draw() for each event after deciding whether I want to read it (with nentries = 1 always). In this case, the selection expression would be parsed for every event, unnecessarily.
I was looking for a way to have nentries > 1 and still do some prescaling. Is the approach with nentries = 1 the best (= lowest runtime) approach to do this?
Your question was:
… the two last parameter do that nentries gives you the number of entries you want to read and firstentry gives you the first event from which you will read these entries.
So I guess that’s the “way to read only a fraction of the events”
I know and understand that this is the standard solution to the broad question I asked. However, I cannot conclude from it whether one can set a prescale flag or something that would make Draw() skip events with a certain probability by itself, without having to call Draw() multiple times, for the reasons I’ve outlined (e.g. having to parse the cut string multiple times).
However, I take your second reply as a hint that my detailed question did not make you think of such an option, so I infer that the answer is no.
Unless you expect correlations between events (the data for event N somehow determines or influences the data for event N+1), then skipping events should be equivalent to only reading a partial block.
Say you have N=100000 events and you want to only read 20% of them. If you were coding your own loop over the entries, you could skip entries where ( i % 5 != 0 ), so you’d only really read every 5th event. But if the events are not correlated, that should be completely equivalent to reading only events [1 … 20000] or any other block with the same number of entries.
If you really insist on skipping every 5th entry but you want to use the TTree::Draw method, you could first create a TEntryList with a suitable selection in the 2nd TTree::Draw argument, then use that entrylist for looping over your tree. I don’t think there is a way to do it probabilistically (i.e. with random numbers) using TTree::Draw, but you could just fill a TEntryList manually with random entry numbers.
Thank you for your reply. The problem is that my code is going to be used for all sorts of trees, and I don’t know ahead of time whether there will be correlations. It could be that in an MC signal sample, Z bosons in events with even event number decay to ee, and with odd event number they decay to mumu … (If that were not the case, I agree that reading the first x*N events is equivalent.)
Your idea with the randomly filled entry list is great. Thanks!
Actually, you should be able to use:
tree->Draw(“something”, “Entry$ % 5 != 0”); // skip every 5th entry
tree->Draw(“something”, “Entry$ % 5 == 0”); // draw every 5th entry
tree->Draw(“something”, “rndm() < 20.0/100.0”); // draw random sample of about 20% of entries
Ah cool, I looked in TMath to see if there was a random-number generator in there (which you can use in TTree::Draw formulas), but couldn’t find anything.
Where is this “rndm” function from? It’s not in cmath or anything. Is it the same as TRandom::Rndm()?
You can make a TF1 with the formula “rndm()” and it re-generates the points each time you draw it (e.g. when you click on the canvas). Pretty nifty, and solving the OP’s problem might be the only reasonable application… = )