I’m trying to add a collection of ~1000 histograms from a file into a single TH1* file. To do this, I do
TH1D *MB_like_pos = new TH1D("MB_like_pos","",800,0,4);
TH1D *tmp1;
for(int i=0; i<nHists; i++){
sprintf(histoname, "%s%d","MB_like_pos_pt",i);
tmp1 = (TH1D*)(f2->Get(histoname))->Clone(histoname);
MB_like_pos->Add(tmp1);
}
f2->Close(); //<--- this especially takes a long time
But this is very slow. Is there a more efficient way to do something like this?
I have noticed that when increasing the number of histograms I need to consider to evaluate variations on a given histogram, for which I usually cloned them to avoid having the original histogram modified, it took forever to run in two points:
1- For each new histogram the time it took to get the variations derived was higher and higher
2- Took forever to finish at the end
I anticipated, for how the code is done, that the problem might come from the Clone of the histogram and that is how I found this old thread. I would like to ask what is exactly the difference between the GetObject method and the Clone. Would the original histogram be modified if I modify the one I got? i.e. with the GetObject method the object that I get is a copy of the original one and not a pointer to it right?
I guess that f2 was initialized with something like:
TFile *f2 = TFile::Open(inputfilename,"READ");
In this case the part
f2->Get(histoname);
reads the date from the file and create an histogram (TH1D) in memory what that data. Since the file is open read-only the original data can not be modified.
Then in the original code, the following was called on this histogram
->Clone(histoname);which resulted in a second copy of the histogram to be made in memory resulting in waste of time, memory and unnecessary growth of the list keeping track of them.
Note also that, as pointed out by Pepe, GetObject is much preferable to the older ‘Get’ as it allows you to both
check for existence (is there an object which that name in the file)
check that the object has the right type.
What is the cardinality of your problem? In some case splitting the histogram in multiple directory in the output file might help.
2- Took forever to finish at the end
This is a known problem with file with very large number of histograms in a file (and in particular in a directory). You can work around the problem, if (and only if) you are sure the same histogram is not shared between two directories:TFile *outputFile = .....
.....
gROOT->GetListOfFiles()->Remove(outputFile);
delete outputFile;