Serious TTREE Benchmarking: Random access

yawn · January 22, 2011, 11:36am

I wrote a simple test for TTREE Benchmarking.
You can compile it (a lot faster), or run it as a macro (a lot slower).

TestTree.cpp (3.65 KB)
It creates a Tree with one branch of UChar_t.
Default entries = 100000000 (so you obtain a file “demo.root” of 100MB).

I want to test expecially “Random access” of entries.
The test probes that “Random access” of first entries is way faster than “Random access” of last entries.

I obtain (with compression 0)
Tree Creation: 8.5s
Full Sequential Read of all entries: 3.5s
Reverse Sequential Read of all entries: 3.5s
Sequential Read of fist half entries: 1.75s
Sequential Read of last half entries: 1.75s
(nothing strange until now)

Random read of only 10000 entries in the full range of entries: 2.85s
Random read of only 10000 entries in the first half of entries: 1.58s
Random read of only 10000 entries in the last half of entries: 3.94s

So random access is way slower, and the time to access an entry increases with position.
Is there a way to have a constant-time and fast random-access to entries?

Thanks
Paolo (YAWN)

PS: I use ROOT 5.28, compiled with GCC 4.5.2 on Linux/64

Axel · January 23, 2011, 6:57am

Hi,

TTrees are made for sequential, forward access, where jumps (forward!) shouldn’t decrease the performance too much. You most likely have the tree cache disabled; your numbers are thus far from optimal. That at least would explain why forward and reverse reading gives identical numbers for you.

You try to optimize random reads; what’s your use case? Why can it not be done as sequential read with gaps?

The only way to improve random access with trees is to put it in “some sort” of random access memory: a RAM disk should work. But even then you will get less performance than sequential reads from a RAM disk!

Cheers, Axel.

velicanu · May 30, 2012, 3:37pm

Hi Axel,

I came across this thread since I also noticed that random access in a TTree is very slow.

The program I am trying to run is not a benchmark, and there’s no way to do it sequentially. The program reads in a root tree and copies it with the entries sorted based on some values. While the algorithm is simple and the specifics are not too relevant but you can still take a look at it here if you’re interested: cmssw.cvs.cern.ch/cgi-bin/cmssw. … iew=markup

The first part of the algorithm reads the tree in sequential order which is fast enough.
The second part does stl quicksort on a simple array which takes no time at all.
The third part however attempts to get “random” entries from the original tree, corresponding to the sorted order, and fill a new tree in that order, and this part is far too slow.

For reference: the input file is 52 GB containing 1.5M entries and so far it seems it will take about 10-20 days to perform the third part of the algorithm.

Is there a way to make TTree::GetEntry work in fast constant time for random access?

Cheers,
Dragos

pcanal · July 9, 2012, 9:52pm

HI Dragos,

The only way is to force the uncompressed basket to say in memory (i.e. you will need enough ram to hold the full TTree in memory). The ‘slow’ down of random access is in large pat due to the fact that each contains more than one (sequential) entry and when you jump around you need to uncompress the same basket many times unless you have enough memory to keep in memory the result of the decompression.

To keep basket around you can either call SetMaxVirtualSize or load all basket explicitly (TTree::LoadBaskets). (in older version of ROOT, the number you need to pass might actually needs to much higher that the actually memory need or available).

Cheers,
Philippe.