Clone over many histograms is very slow

Hi all,

I’m trying to add a collection of ~1000 histograms from a file into a single TH1* file. To do this, I do

TH1D *MB_like_pos = new TH1D("MB_like_pos","",800,0,4);
TH1D *tmp1;
for(int i=0; i<nHists; i++){
     sprintf(histoname, "%s%d","MB_like_pos_pt",i);
     tmp1 = (TH1D*)(f2->Get(histoname))->Clone(histoname);
     MB_like_pos->Add(tmp1);
}
f2->Close(); //<--- this especially takes a long time

But this is very slow. Is there a more efficient way to do something like this?

Thanks

snprintf(histoname, sizeof(histoname), "MB_like_pos_pt%d", i); f2->GetObject(histoname, tmp1); if (tmp1) MB_like_pos->Add(tmp1); else std::cerr << "Warning: " << histoname << " NOT found!" << std::endl;

Thanks! This is what I was looking for.

Hi,

I have noticed that when increasing the number of histograms I need to consider to evaluate variations on a given histogram, for which I usually cloned them to avoid having the original histogram modified, it took forever to run in two points:
1- For each new histogram the time it took to get the variations derived was higher and higher
2- Took forever to finish at the end

I anticipated, for how the code is done, that the problem might come from the Clone of the histogram and that is how I found this old thread. I would like to ask what is exactly the difference between the GetObject method and the Clone. Would the original histogram be modified if I modify the one I got? i.e. with the GetObject method the object that I get is a copy of the original one and not a pointer to it right?

Thanks a lot!

Hi,
In this line of code,

I guess that f2 was initialized with something like:

In this case the part

reads the date from the file and create an histogram (TH1D) in memory what that data. Since the file is open read-only the original data can not be modified.
Then in the original code, the following was called on this histogram

->Clone(histoname);which resulted in a second copy of the histogram to be made in memory resulting in waste of time, memory and unnecessary growth of the list keeping track of them.

Note also that, as pointed out by Pepe, GetObject is much preferable to the older ‘Get’ as it allows you to both

  • check for existence (is there an object which that name in the file)
  • check that the object has the right type.

What is the cardinality of your problem? In some case splitting the histogram in multiple directory in the output file might help.

2- Took forever to finish at the end

This is a known problem with file with very large number of histograms in a file (and in particular in a directory). You can work around the problem, if (and only if) you are sure the same histogram is not shared between two directories:TFile *outputFile = ..... ..... gROOT->GetListOfFiles()->Remove(outputFile); delete outputFile;

Cheers,
Philippe.