Time taken to GetEntry(i)

I am curious about the time taken in the GetEntry() and its relation with the data structure. I have a TChain with millions of entries. When I read it in ascending order, i.e.

for (int i=0;i<nevents;i++) {chain->GetEntry(i);}

It runs fast without issue. However, reading it with random order is much slower. For instance,

std::vector<int> eid(nevents);
for (int ievt = 0; ievt < nevents; ievt++) {eid[ievt] = ievt;}
for (int i=0;i<nevents;i++) {chain->GetEntry(eid(i));}

Can anyone explain the behaviour? Is It because there are some “linked-list” implementation in Tree?

Thanks a lot.

Please read tips for efficient and successful posting and posting code

ROOT Version: 6.26.06
Platform: Ubuntu
Compiler: Not Provided

That is normal. ROOT files are highly optimized for the sequential reading of tree data.

The data is a TTree is stored in blocks (baskets) containing the information for several entries for a given branch. Those blocks are stored in the files in compressed form. The default behavior when traversing the TTree in random order will result in those blocks being read from the file and decompressed (the longest part of the I/O) for each entry instead of once per bunch of entries. For example, if the TTree has 100,000 entries and each blocks contains 1000 entries, the serial order will result in 100 decompressions (per branch) while in the worst case scenario the random walk will result in 100,000 decompressions (per branch) .

If you can machine has enough memory you can preload each and every baskets in memory:


Note if your chain has multiple files, a random walk is even worse as it can result in each file being open and close multiple times (also a very costly operation) and the above work-around will not work as it is only for one TTree (at a time)

1 Like