Odd results when working with histograms of the same name from different files

Hi,

I have two root files (in the example “even.root” and “odd.root”), which contain histograms with identical names (“histA” and “histB” in each of the two files).
Sometimes when I merge histograms (and the merged histogram has the same name as one of the original histograms) I run into unexpected results when later retrieving a histogram of that same name from the other root file:
Instead of the histogram from the second file I seem to get the merged histogram from the first file.
So far I have only observed unexpected results when merging histograms from one of the root files and then loading a histogram with the same name from the other root file.
Giving a unique name to the merged histogram seems to fix the problem (see comment in “merge_histogram”).
However, seemingly arbitrary changes e.g. changing the order in which I define the TFileHandlers also make the problem disappear, and I don’t know why.
Therefore I am a bit worried that problems could also occur independent of merging, just because I have two files with the same histogram names that I am working with in parallel and that I have simply not noticed any of these problems yet.
I am working on expanding some common utilities (like the functions defined in the attached example), but I don’t really have control over things like in which order other users will define their TFileHandlers or in which order they load, modify, scale or merge histograms before plotting them.
Do you have any idea why the order in which I open the root files seems to matter?
Why does the problem apparently only occur when I merge histograms, i.e. create a third histogram of the same name, not when I merely work with two histograms of the same name from different files?
Is the second thing always guaranteed to work as expected, even when I manipulate the individual histograms in some way?
Is there anything I can do on the level of the helper functions I define to make sure problems don’t appear later (like renaming all histograms when I load them)?
One of the problems is also that there is no warning issued if one histogram is apparently replaced by another and problems like this might be impossible to spot with real data.

I have attached a working example which I tried to shorten as much as possible and two toy input files for testing.

Pyroot version 6.32.08

even.root (4.0 KB)
odd.root (4.0 KB)
MWE.py (3.7 KB)

Hi,

Thanks for sharing these thoughts and welcome to the ROOT Community!

The example is 130 lines of code: could you reduce it to something minimal which shows the unintended behaviour?

Best,
D

I think you already found the issue and the solution. In the merge_histograms function you use the same name as the original for the clone, which you should avoid. If you add some printouts and compare with same/different names:

def merge_histograms(hists, merged_name = None):
    if merged_name is None:
        # comment/uncomment one of the next 2 and compare the outputs:
        #merged_name = hists[0].GetName()         # same name, bad
        merged_name = 'm_' + hists[0].GetName()   # changing name, ok
        # merged_name = hists[0].GetName() + "_" + str(uuid.uuid4()) # THIS SEEEMS TO FIX THE PROBLEM
    print('- - before adding')
    ROOT.gDirectory.ls()
    hist_merged = hists[0].Clone(merged_name)
    for hist in hists[1:]:
        if hist is not None:
            hist_merged.Add(hist)
    print('- - after adding')
    ROOT.gDirectory.ls()
    return hist_merged

you get, for the ‘bad’ case:

- - - - - - - - - - - - -
- - - - - - Merging even
- - before adding
TFile**		odd.root	
 TFile*		odd.root	
  KEY: TH1F	histA;1	Hist A from file odd
  KEY: TH1F	histB;1	Hist B from file odd
- - after adding
TFile**		odd.root	
 TFile*		odd.root	
  OBJ: TH1F	histA	Hist A from file even : 0 at: 0x40e47e80
  KEY: TH1F	histA;1	Hist A from file odd
  KEY: TH1F	histB;1	Hist B from file odd

- - - - - - - - - - - - -
- - - - - - Merging odd
- - before adding
TFile**		odd.root	
 TFile*		odd.root	
  OBJ: TH1F	histA	Hist A from file even : 0 at: 0x40e47e80
  OBJ: TH1F	histB	Hist B from file odd : 0 at: 0x41638f90
  KEY: TH1F	histA;1	Hist A from file odd
  KEY: TH1F	histB;1	Hist B from file odd
- - after adding
TFile**		odd.root	
 TFile*		odd.root	
  OBJ: TH1F	histA	Hist A from file even : 0 at: 0x40e47e80
  OBJ: TH1F	histB	Hist B from file odd : 0 at: 0x41638f90
  OBJ: TH1F	histA	Hist A from file even : 0 at: 0x417266f0
  KEY: TH1F	histA;1	Hist A from file odd
  KEY: TH1F	histB;1	Hist B from file odd

and for the ‘good’ case:

- - - - - - - - - - - - -
- - - - - - Merging even
- - before adding
TFile**		odd.root	
 TFile*		odd.root	
  KEY: TH1F	histA;1	Hist A from file odd
  KEY: TH1F	histB;1	Hist B from file odd
- - after adding
TFile**		odd.root	
 TFile*		odd.root	
  OBJ: TH1F	m_histA	Hist A from file even : 0 at: 0x2b4778e0
  KEY: TH1F	histA;1	Hist A from file odd
  KEY: TH1F	histB;1	Hist B from file odd

- - - - - - - - - - - - -
- - - - - - Merging odd
- - before adding
TFile**		odd.root	
 TFile*		odd.root	
  OBJ: TH1F	m_histA	Hist A from file even : 0 at: 0x2b4778e0
  OBJ: TH1F	histA	Hist A from file odd : 0 at: 0x2b47b9f0
  OBJ: TH1F	histB	Hist B from file odd : 0 at: 0x2b2892f0
  KEY: TH1F	histA;1	Hist A from file odd
  KEY: TH1F	histB;1	Hist B from file odd
- - after adding
TFile**		odd.root	
 TFile*		odd.root	
  OBJ: TH1F	m_histA	Hist A from file even : 0 at: 0x2b4778e0
  OBJ: TH1F	histA	Hist A from file odd : 0 at: 0x2b47b9f0
  OBJ: TH1F	histB	Hist B from file odd : 0 at: 0x2b2892f0
  OBJ: TH1F	m_histA	Hist A from file odd : 0 at: 0x2bcd9550
  KEY: TH1F	histA;1	Hist A from file odd
  KEY: TH1F	histB;1	Hist B from file odd

See the last “OBJ” in “after adding” of “merging odd”; in the bad case it used the “even” hist A; in the good case it used the “odd” hist A.

Ah I see.
So the reason the behavior changes when I change the order in which I load the files is that any new histograms (like the merged one) are somehow associated with the last file I opened?

that any new histograms (like the merged one) are somehow associated with the last file I opened?

That is correct. Before creating new histogram, do:

outputfile->cd();

Another option is to call myhist->SetDirectory( outputfile ); on the new histograms.
See also TDirectory::TContext

Thank you all for your help, I think I understand the underlying problem much better now.
I have written a shorter example of the problem below. I have come up with three possible solutions, but would be very grateful for some feedback by someone more experienced.

import ROOT

def get_obj(file, obj_name):
    obj = file.Get(obj_name) # retrieves the histogram and attaches it to the file from which it was retrieved
    obj = obj.Clone() # clones the histogram and attaches it to the current directory 
                      # (i.e. the file that was opened last, NOT THE FILE THE HISTOGRAM WAS RETRIEVED FROM)
    obj.SetDirectory(0) # this removes the attachment of the clone from the current directory
    return obj

HIST_NAME = "hist"

# create two test files with histograms of the same name
with ROOT.TFile("file1.root", "recreate") as outfile:
    h = ROOT.TH1F(HIST_NAME, "hist from file1", 1, 0, 1)
    outfile.WriteObject(h, HIST_NAME)

with ROOT.TFile("file2.root", "recreate") as outfile:
    h = ROOT.TH1F(HIST_NAME, "hist from file2", 1, 0, 1)
    outfile.WriteObject(h, HIST_NAME)

### start of the example

file1 = ROOT.TFile.Open("file1.root")
file2 = ROOT.TFile.Open("file2.root") # second file is opened, this is now the current directory
ROOT.gDirectory.ls()

hist1 = get_obj(file1, HIST_NAME) # the histogram from file1 is opened, but not attached to file2 odd due to calling SetDirectory(0) in get_obj
hist1_clone = hist1.Clone() # the clone of the histogram from file1 is attached to file2, because that is the current directory
                            # NO WARNING IS ISSUED despite a histogram of that name already being in that file
hist1_clone.SetTitle("hist from file1 clone")
ROOT.gDirectory.ls()

hist2 = get_obj(file2, HIST_NAME) # instead of the original histogram from file2, the cloned histogram is returned
                                  # because it is the oldest histogram of that name ATTACHED to file2
                                  # NO WARNING IS ISSUED
                                  # the original histogram from file2 can no longer be retrieved by name
ROOT.gDirectory.ls()

print(hist1.GetTitle(), "- This should be 'hist from file1'.")
print(hist1_clone.GetTitle(), "- This should be 'hist from file1 clone'.")
print(hist2.GetTitle(), "- This should be 'hist from file2'.") # instead this yields 'hist from file1 clone'
file1.Close()
file2.Close()

So in short I can no longer retrieve a histogram from the last file I opened if I have cloned a histogram of the same name from the first file before.
In my eyes there are three possible solutions to this issue I can implement in the framework I am working on:

  1. Make sure each cloned histogram is assigned a new name. This might be difficult to implement since cloning histograms happens in many places, also outside of predefined helper functions, e.g. when cloning a histogram for computing ratios or making ratio plots. For use in helper functions it could by achieved by appending a uuid.uuid4() to the cloned histogram. This was the solution I originally found, but as mentioned it is difficult to enforce for a framework used by multiple people.
  2. Having multiple histograms with the same name only seems to be an issue when retrieving histograms by name. (Can you confirm this? Making sure no two histograms with identical names are written to the same output file is not an issue.) The only time histograms are retrieved by name is when they are read from the file. If I attach a file specific prefix to each histogram when retrieving it with the get_obj function I can ensure that no clone of that histogram has the same name as the original and the problem is avoided. I have tested a way of implementing this and it works for all cases I have tested.
  3. Prevent histograms from being attached to any files by calling ROOT.TH1.AddDirectory(False) at the beginning of the code. This sounds like a relatively clean solution to me, but I don’t know if there are any major implications of this I am missing.

At the moment I am inclined to choose option 2, but I would be grateful for feedback if I am missing some obvious problem, since I have limited experience with this.