I am currently facing a performance issue which I wonder if there is a nice work around for it. I am trying to loop over a small sample (10000events) of a NTUP (~10mio events) since there is a structure within the NTUP I would like to pick a randomly 10000 events. Here the way I implemented it:
import random
r=random.sample(xrange(nEntries),1000) #tree.LoadBaskets() # this is something I found in a forum but to just execute this line takes 30+ min
for i in r:
tree.GetEntry(i)
if I do this the time to loop over each event is about 100times higher as if I just loop in order (1,2,3,4,5, …) over the events
suggestion: use the TBits class
-create a TBits bits(10000000)
get 10000 integer numbers between 0 and 10 millions
-for each integer use bits.TestBitNumber to check if you have already selected this particular entry and call bits.SetBitNumber otherwise
-then loop on the bits object using bits.FirstSetBit(startbit)
-call tree.GetEntry with the value returned by FirstSetBit
I am not quite sure how to implement your idea
I can create the 10000 integer numbers and they are unique so I think I dont have to check but I dont quite understand the other parts
for each integer I call bits.SetBitNumber(integer)?
how do I loop afterwards? and what is startbit?
an example like
for i in bits…:
tree.GetEntry(bits.FirstSetBit(i))
[quote=“mstoebe”]
if I do this the time to loop over each event is about 100times higher as if I just loop in order (1,2,3,4,5, …) over the events
thanks for your help
Michael[/quote]
Do you mean that looping over 10000 random events is much slower than looping over the first 10000 events?
This would not be too surprising given how disk access works (sequential is faster), and possibly the way ROOT compresses the files (contiguous events may be compressed together? Don’t know details), and maybe memory caching considerations.
Yes, the content of ROOT TTree is optimized for sequential reading. Random reads is possible but lead to multiple decompression of the same events (events are bunched in basket for each branches). You will be much better sorting the entry reading them in increasing order. One of the best way to do so is to use a TEntryList.