Performance of TTree::GetEntry(): direct access?

Hallo everyone,

I have a TTree with double and float branches. And it is necessary in my program to jump on the tree and read the entry values. I noticed that the computing time with GetEntry(int) is dependent on the distance that I jumped. Is it true, or I messed up with my tree.
To explain my question more clearly, here is a example.

TTree* tr = [i]file[/i].Get("[i]channel000[/i]"); double time; float value; tr->SetBranchAddress("time",&time); tr->SetBranchAddress("value",&value); for (int k =0; k<1000;i++) { tr->GetEntry(k*1000); }

The code above need more time than the following:

TTree* tr = [i]file[/i].Get("[i]channel000[/i]"); double time; float value; tr->SetBranchAddress("time",&time); tr->SetBranchAddress("value",&value); for (int k =0; k<1000;i++) { tr->GetEntry(k*10); }

The question is: is the observation true?

And for more information, I post the output of TTree::Print() here too.

[code]root [52] tr->Print()


*Tree :chan000 : Data *
*Entries : 32859024 : Total = 395588267 bytes File Size = 211012486 *

  •    :          : Tree compression factor =   1.87                       *
    

*Br 0 :time : time/D *
*Entries : 32859024 : Total Size= 263711748 bytes File Size = 170853170 *
*Baskets : 8235 : Basket Size= 32000 bytes Compression= 1.54 *

*Br 1 :value : value/F *
*Entries : 32859024 : Total Size= 131876209 bytes File Size = 40012817 *
*Baskets : 4117 : Basket Size= 32000 bytes Compression= 3.29 *

[/code]

This became really a problem of mine. For some reason I have to jump on the tree collecting values. And then return to the beginning position. And repeat the jumping collecting route for hundred times. This tree is only a test tree. The actual one will be much larger the this one. And now this procedure is already time consuming. I am afraid the program is really going to have performance problem afterwards. It would be really nice if someone could help me.
Thanks!

Simon

Hi,

The data for each branch is bunched in a series of baskets. Each basket is compressed and stored on disk separately. One basket contains a contiguous series of entry. For example your branch ‘time’ has 8235 baskets each holding 3990 entries; the first basket holding entries 0 through 3989; etc.

So when you read the tree ‘sparsely’, you will need to read from basket (and then decompress) more baskets. In your second example, you end up reading only 5 baskets while in the first example you are reading 375 baskets!

If you need to read so sparsely and need to re-read the tree many times and you have enough RAM (your toy example seem to meet all those criteria however a production TTree might be too large to fit all in memory) you can load the full content of the TTree in memory (hence reading from disk only once) by calling:

Cheers,
Philippe.

Hi pcanal,

thanks for your reply. It sounds to be a solution.
I tested it, the result was great!
Still I got the last questions:
What can happen if the actual TTree size is larger than the max_memory_size set to the LoadBaskets? I tested it by setting the max_memory_size to a very small number. It was running. So the question is, is there really some mechanisms to handle such a case?
And what happens with the data in memory, do I have to kill them with delete, or it will be removed automatically after closing the TFile?

Cheers,

Simon

When you delete the Tree, all buffers in memory associated with the tree are deleted.
Coming back to your original question, I suggest to create your original tree with no compression. Random access will be faster and also as indicated by Philippe, make the buffer sie smaller.

Rene