TChain GetEntry performance reading non-contiguous events

reznicek · April 2, 2011, 10:01am

Hello,

I need to read a low fraction (~0.1%) of events out of a large TTree/TChain (~50M events). The requested events are randomly scattered in the sample. However I have a sorted list of them. When comparing speed reading sequentially all events with the speed of reading just the selected ones, the later case is ~10x slower (meant speed per one read event). I have seen in the forums several similar questions, but the suggested solutions there did not seem to make any difference (but it is quite possible I didn’t really get it or applied correctly). What I tried:

lowering the branches basket size to 1000 to prevent cashing of the next events in line (which I in 99% will not want to read)
setting the cache (TTreeCache) size to zero to prevent cashing of next events
setting up TEventList to force the caching of the events in this list instead of just the next in the TTree

Is there some other possible way how to improved the GetEntry performance in the case described above ?

Thanks,
Pavel

PS: Since the structure of the TTree is relatively large (and composed of ~50 std::vectors), I already disabled all branches I do not need (~75% are disabled), but it does only little improvement to the readout speed.

pcanal · April 4, 2011, 4:05pm

[quote]but the suggested solutions there did not seem to make any difference (but it is quite possible I didn’t really get it or applied correctly). [/quote]It is likely that they were not applied correctly since they should make a difference even if a negative difference.

[quote]- setting the cache (TTreeCache) size to zero to prevent cashing of next events[/quote]Are your reading a local or remote file? If you know the list of branch you are going to read, you should use the TTreeCache with manual training.

[quote]- setting up TEventList to force the caching of the events in this list instead of just the next in the TTree
[/quote]This would make a difference only if the TTreeCache is enabled.

[quote]- lowering the branches basket size to 1000 to prevent cashing of the next events in line (which I in 99% will not want to read)[/quote]A priori, decrease the basket size (at the time of the writing of the file) should reduce the time spent.

A priori the best timing should be with small basket size, TTreeCache enabled and manually trained and the EventList set.

If none of this helps, then it is likely that you distribution is such that you still need to read from the disk all the baskets and pay the price of uncompression. If this is the case, what might actually help most is to disable compression when writing the file.

Cheers,
Philippe.