Step in memory usage when reading from TTree

suhl · August 25, 2017, 4:10pm

Hi,

I have an issue with memory consumption when reading data from a TTree. I have to read from roughly 200 trees at the same time. Due to memory constrains I have to switch off caching. This works for the first entries, but after some time there is an increase of memory consumption for each tree. This happens when the second cluster of a tree is read (in my case fAutoSave is equal to fAutoFlush so this happens when the loop over the entries is effectively calling GetEntry(fAutoSave)). In my application this step is around 70 MB per tree, so at this point my program is killed as it exceeds the allowed memory. This is not a memory leak, as the memory is freed once the ROOT files containing the trees are closed.

I tried to reproduce this with a simpler script and indeed find a (although very much smaller) step also there. The attached example create.C will create a file containing a tree that only contains an always empty vector of strings. The tree (header and baskets) will be saved all 20000 events. When reading back the file there is indeed a small step in memory usage after entry 20000 has been read from the tree.

Is there anything that could be done to avoid this step in memory usage?

Best regards,
Sebastian

create.C (326 Bytes)
read.C (559 Bytes)

pcanal · August 25, 2017, 4:41pm

Hi,

What you described is the expected behavior and the amount of memory used is controllable.

At the first cluster flush, the TTree will rellocate the basket’s buffer to try to accommodate having all the event of the cluster to fit into one buffer.

The default cluster size is 32Mb of compressed memory. This results in an aggregate memory allocation of 32MB times the compression ration (so in your case it looks like the compression ration is around 2).

To control the memory use you can change the size of the cluster (this will decrease compression and lead to slightly worse reading performance).

You have 3 options:

Set the number of entries in a cluster, In your case the current value is 20,000 so to reduce the memory in half do:

tree->SetAutoFlush( 10000 );

Set the size of the cluster in compressed size. The default is 30000000 and thus to reduce the memory in half do

tree->SetAutoFlush( - 15000000 ); // note the value is negative.

Disable the autoflushing completely

tree->SetAutoFlush( 0 );

This will leave the buffer to be the size specified during the branch creation, this will also lead to ‘bad’ performance when reading the file (significantly increases the number of file seeks).

Cheers,
Philippe.

suhl · August 25, 2017, 7:14pm

Hi Philippe,

thanks for the quick reply.

As far as I understand, your suggestions would only work for trees that are newly written. Is there any way this can be applied to files/trees already existing?

Regards,
Sebastian

pcanal · August 25, 2017, 8:37pm

With existing TTrees, you can save some memory reading only the branch you really need and by disabling the TTreeCache.

Cheers,
Philippe.

system · September 8, 2017, 8:37pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.