Hi,
I’m performing an analysis on a large dataset, and so that the analysis can run in a reasonable time I have to store the entire dataset in memory. I have been using the Google Performance Tools to analyse the program’s memory allocation and noticed that a large amount of memory is allocated by the TBuffer when reading the input TTree.
I wrote a simple program to test this on a subsample of my dataset:
#include "TChain.h"
int main(int argc, char *argv[]){
TChain* chain = new TChain("flattree");
chain->AddFile(argv[argc-1]);
chain->GetEntries();
}
and profiled this using the Google tools. The first few lines of output are below, showing that TBuffer was allocating about 800Mb of memory when loading the ROOT tree.
Total: 884.9 MB
850.2 96.1% 96.1% 850.2 96.1% TBuffer::TBuffer
24.4 2.8% 98.8% 24.4 2.8% TStorage::ReAllocChar
4.1 0.5% 99.3% 429.2 48.5% TBasket::Streamer
2.2 0.3% 99.6% 2.2 0.3% TBufferFile::ReadArray
1.0 0.1% 99.7% 1.0 0.1% TLeafD::SetAddress
0.8 0.1% 99.8% 0.8 0.1% TStorage::ObjectAlloc
0.4 0.0% 99.8% 431.2 48.7% TStreamerInfo::ReadBuffer
By disabling all of the branches of the tree the TBuffer memory allocation falls to 400Mb, which is still rather large. Also, the size of the TBuffer memory allocation seems to scale with the number of entries in the input tree, so this memory usage will cause a problem when running over the full dataset.
Is there a way to control the size of the TBuffer when reading a TTree, or is this all set when the TTree is created?
Thanks
Mark