Working with histograms without modifying the original one


I have been dealing with this issue since the beginning and although I found a way of doing this it becomes really inefficient when the number of histograms increase. I will try to explain the issue and then ask the question because I am afraid it will be confusing.

What I would like to do is to get histograms from different files and work with them (sometimes I have to add them or sometimes I need to modify them based on different conditions). The point is that I don’t want to modify the original histogram. When doing something like

TH1D *h = (TH1D*)file->Get("histoname")

everything that I do to h will remain if later I need to get the original histogram again. The way of doing this is to clone the histogram as far as I know

I have been doing this for a couple of years and this works. I can get all the histogram that I want, add them together and later, if I want the original histogram, I just get it again in the same way and there it is, unchanged. The issue is that when the number of files and hisotgrams I have to use increases, the performance gets worse and worse because for each iteration (I might need to loop over thousands of histograms) the memory taken increases and the processing times also increase.

I found a discussion in which they say to use GetObject, i.e.

TH1D* histo;

To avoid the clone and speed it up. Actually this increase the speed a lot and memory usage improves but although I though it does the same as the clone, it does not. I noticed that when I add histograms together the original ones gets modified which means that the results are all wrong when later I use the histogram again. For example, imagine a case in which I have 4 samples, each one with an histogram. I might want to get the total of them as a total background of a given analysis. What I do is:

In this example, h[0] will now the sum of the four of them, since that is how the Add works. But maybe later I need the histogram of the first one, h[0] then won’t be the histogram of the first one but the sum of all of them. The way of avoid this is to clone h[0] and add the rest on the first clone. But these clones start using a lot of memory at the end.

Is there a way of getting a copy of the histogram without using Clone, which apparently was taking most of the computing time? Is there anything that I missed that would make the code with the clones slower? Maybe I need to delete the cloned histograms or something. For each iteration the computing time increased more and more and when using over 200 histograms it can take up to a minute to process each iteration…

Thanks a lot in advance,

The method GetObject offers better protection than Get, but in general, both methods do the same (none of them “clones” anything).
So, you will need two copies of your histograms.
An easy way could be: TH1D *histo; // a "working" copy (can be modified) file->GetObject("hname", histo); // get "hname" if (histo) { histo->SetDirectory(gROOT); // (gROOT) ... or ... (0) // histo->SetName("newhistoname"); // you can change its name } TH1D *histo_orig; // an "original" copy (keep it unchanged) file->GetObject("hname", histo_orig); // get "hname" again Note: if you “delete file;”, the associated “histo_orig” will automatically be deleted, too, but you will need to “delete histo;” manually yourself (when you no longer need it).