Performance of TTree::GetEntry(): direct access?

biorooty · July 9, 2009, 5:24pm

Hallo everyone,

I have a TTree with double and float branches. And it is necessary in my program to jump on the tree and read the entry values. I noticed that the computing time with GetEntry(int) is dependent on the distance that I jumped. Is it true, or I messed up with my tree.
To explain my question more clearly, here is a example.

TTree* tr = [i]file[/i].Get("[i]channel000[/i]"); double time; float value; tr->SetBranchAddress("time",&time); tr->SetBranchAddress("value",&value); for (int k =0; k<1000;i++) { tr->GetEntry(k*1000); }

The code above need more time than the following:

TTree* tr = [i]file[/i].Get("[i]channel000[/i]"); double time; float value; tr->SetBranchAddress("time",&time); tr->SetBranchAddress("value",&value); for (int k =0; k<1000;i++) { tr->GetEntry(k*10); }

The question is: is the observation true?

And for more information, I post the output of TTree::Print() here too.

[code]root [52] tr->Print()

*Tree :chan000 : Data *
*Entries : 32859024 : Total = 395588267 bytes File Size = 211012486 *

   :          : Tree compression factor =   1.87                       *

*Br 0 :time : time/D *
*Entries : 32859024 : Total Size= 263711748 bytes File Size = 170853170 *
*Baskets : 8235 : Basket Size= 32000 bytes Compression= 1.54 *
…
*Br 1 :value : value/F *
*Entries : 32859024 : Total Size= 131876209 bytes File Size = 40012817 *
*Baskets : 4117 : Basket Size= 32000 bytes Compression= 3.29 *
…
[/code]

This became really a problem of mine. For some reason I have to jump on the tree collecting values. And then return to the beginning position. And repeat the jumping collecting route for hundred times. This tree is only a test tree. The actual one will be much larger the this one. And now this procedure is already time consuming. I am afraid the program is really going to have performance problem afterwards. It would be really nice if someone could help me.
Thanks!

Simon

pcanal · July 10, 2009, 3:33am

Hi,

The data for each branch is bunched in a series of baskets. Each basket is compressed and stored on disk separately. One basket contains a contiguous series of entry. For example your branch ‘time’ has 8235 baskets each holding 3990 entries; the first basket holding entries 0 through 3989; etc.

So when you read the tree ‘sparsely’, you will need to read from basket (and then decompress) more baskets. In your second example, you end up reading only 5 baskets while in the first example you are reading 375 baskets!

If you need to read so sparsely and need to re-read the tree many times and you have enough RAM (your toy example seem to meet all those criteria however a production TTree might be too large to fit all in memory) you can load the full content of the TTree in memory (hence reading from disk only once) by calling:

Cheers,
Philippe.

biorooty · July 10, 2009, 9:17am

Hi pcanal,

thanks for your reply. It sounds to be a solution.
I tested it, the result was great!
Still I got the last questions:
What can happen if the actual TTree size is larger than the max_memory_size set to the LoadBaskets? I tested it by setting the max_memory_size to a very small number. It was running. So the question is, is there really some mechanisms to handle such a case?
And what happens with the data in memory, do I have to kill them with delete, or it will be removed automatically after closing the TFile?

Cheers,

Simon

brun · July 10, 2009, 2:36pm

When you delete the Tree, all buffers in memory associated with the tree are deleted.
Coming back to your original question, I suggest to create your original tree with no compression. Random access will be faster and also as indicated by Philippe, make the buffer sie smaller.

Rene