Size in memory of the TBuffer when reading ROOT tree

Hi,

I’m performing an analysis on a large dataset, and so that the analysis can run in a reasonable time I have to store the entire dataset in memory. I have been using the Google Performance Tools to analyse the program’s memory allocation and noticed that a large amount of memory is allocated by the TBuffer when reading the input TTree.

I wrote a simple program to test this on a subsample of my dataset:

#include "TChain.h"

int main(int argc, char *argv[]){

  TChain* chain = new TChain("flattree");
  chain->AddFile(argv[argc-1]);

  chain->GetEntries();
}

and profiled this using the Google tools. The first few lines of output are below, showing that TBuffer was allocating about 800Mb of memory when loading the ROOT tree.

Total: 884.9 MB
   850.2  96.1%  96.1%    850.2  96.1% TBuffer::TBuffer
    24.4   2.8%  98.8%     24.4   2.8% TStorage::ReAllocChar
     4.1   0.5%  99.3%    429.2  48.5% TBasket::Streamer
     2.2   0.3%  99.6%      2.2   0.3% TBufferFile::ReadArray
     1.0   0.1%  99.7%      1.0   0.1% TLeafD::SetAddress
     0.8   0.1%  99.8%      0.8   0.1% TStorage::ObjectAlloc
     0.4   0.0%  99.8%    431.2  48.7% TStreamerInfo::ReadBuffer

By disabling all of the branches of the tree the TBuffer memory allocation falls to 400Mb, which is still rather large. Also, the size of the TBuffer memory allocation seems to scale with the number of entries in the input tree, so this memory usage will cause a problem when running over the full dataset.

Is there a way to control the size of the TBuffer when reading a TTree, or is this all set when the TTree is created?

Thanks

Mark

[quote]and so that the analysis can run in a reasonable time I have to store the entire dataset in memory[/quote]By definition, if you do want to load all the data in memory, it will need to take at least the same size as the file (sum of the compressed size of all the baskets).

[quote]chain->GetEntries();[/quote]This is an expansive operation which will open all the file, load the TTree meta data from the file and then delete the TTree and the TFile (except for the last one in the chain).

[quote]By disabling all of the branches of the tree the TBuffer memory allocation falls to 400Mb, which is still rather large.[/quote]This is (should be) the sum of the decompressed size of the meta data for all the TTrees in the chain. This include the location in the file of each baskets ; the size needed to keep track of theses locations will indeed grow (step-wise) linearly with the number of entries.

[quote]Is there a way to control the size of the TBuffer when reading a TTree, or is this all set when the TTree is created?[/quote]It is set when the TTree is written.

Cheers,
Philippe.