Subtracting histograms does not yield difference in entries

mwilkins · October 16, 2018, 3:55pm

When I subtract two histograms, the number of entries in the subtracted histogram is not equal to the difference in the number of entries between the two histograms. It seems that the resulting histogram does not contain the contents of the underflow and overflow bins.

This is not, however, the case for addition, which seems to preserve the sums just fine. Is this behavior expected? It is very unintuitive.

Step-by-step demonstration:

Get TFile and TTree and draw a branch into histograms (with different cuts):

root [0] TFile f("data/Datacut_noLcMcut.root")
(TFile &) Name: data/Datacut_noLcMcut.root Title: 
root [1] TTree * t = (TTree*) f.Get("DecayTree")
(TTree *) 0x7fb9ae249060
root [2] TH1D * h1 = new TH1D("h1", "h1", 225, 4510, 5410)
(TH1D *) 0x7fb9ae170240
root [3] TH1D * h2 = new TH1D("h2", "h2", 225, 4510, 5410)
(TH1D *) 0x7fb9ae16fc30
root [4] t->Draw("X_LOKI_MASS_ConstrLc>>h1", "abs(Lc1_M - 2286.46) < 14", "goff")
(long long) 1253
root [5] t->Draw("X_LOKI_MASS_ConstrLc>>h2", "abs(Lc2_M - 2286.46) > 14", "goff")
(long long) 868
root [6] h1->GetEntries()
(double) 1253.0000
root [7] h2->GetEntries()
(double) 868.00000

Notice above that the histograms each have the same number of entries as the tree has that pass their cuts.

Now, if I subtract the two histograms, the resulting number of entries is not equal to the difference between the two:

root [12] TH1D * h3 = (TH1D*)h1->Clone("h3")
(TH1D *) 0x7fb9ae1b0970
root [13] h3->Add(h2, -1)
(bool) true
root [14] h3->GetEntries()
(double) 130.00000
root [15] h1->GetEntries() - h2->GetEntries()
(double) 385.00000

But if I add them, it is:

root [17] TH1D * h4 = (TH1D*)h1->Clone("h4")
(TH1D *) 0x7fb9ae1b2600
root [18] h4->Add(h2, +1)
(bool) true
root [19] h4->GetEntries()
(double) 2121.0000
root [20] h1->GetEntries() + h2->GetEntries()
(double) 2121.0000

Now, if I compare the number of entries in the subtracted histogram to the number of entries that don’t fall into underflow or overflow, I see that they are equal:

root [14] h3->GetEntries()
(double) 130.00000
root [16] t->GetEntries("abs(Lc1_M - 2286.46) < 14 && X_LOKI_MASS_ConstrLc>=4510 && X_LOKI_MASS_ConstrLc<5410") - t->GetEntries("abs(Lc2_M - 2286.46) > 14 && X_LOKI_MASS_ConstrLc>=4510 && X_LOKI_MASS_ConstrLc<5410")
(long long) 130

This is not true for the added histograms:

root [19] h4->GetEntries()
(double) 2121.0000
root [21] t->GetEntries("abs(Lc1_M - 2286.46) < 14 && X_LOKI_MASS_ConstrLc>=4510 && X_LOKI_MASS_ConstrLc<5410") + t->GetEntries("abs(Lc2_M - 2286.46) > 14 && X_LOKI_MASS_ConstrLc>=4510 && X_LOKI_MASS_ConstrLc<5410")
(long long) 1034

Is there a reason the behavior for addition and subtraction vis-a-vis underflow and overflow bins differs?

ROOT Version: 6.15/01
Platform: macOS Mojave
Compiler: Not Provided

Wile_E_Coyote · October 16, 2018, 4:01pm

mwilkins · October 16, 2018, 4:21pm

This is still extremely unintuitive and feels like a bug.

First, if I use GetEffectiveEntries() for all the histograms, the results still don’t match; I have to use GetEntries() (and not GetEffectiveEntries(), which returns some other unintelligible number) for the subtracted histogram:

root [46] h1->GetEffectiveEntries() + h2->GetEffectiveEntries()
(double) 1034.0000
root [47] h4->GetEffectiveEntries()
(double) 1034.0000
root [48] h1->GetEffectiveEntries() - h2->GetEffectiveEntries()
(double) 130.00000
root [49] h3->GetEffectiveEntries()
(double) 61.231884
root [50] h3->GetEntries()
(double) 130.00000

The documentation for GetEffectiveEntries() does not explain this behavior. Indeed, since my histogram is unweighted (or at least, I have not assigned it any weights), the documentation suggests GetEffectiveEntries() should return the same result as GetEntries() (less the underflow and overflow bins):
“In case of an unweighted histogram this number is equivalent to the number of entries of the histogram.”

Second, I see now that the overflow and underflow bins are indeed preserved (and have the expected content) in the subtracted and added histograms:

root [52] h1->GetBinContent(h1->GetNbinsX()+1) + h2->GetBinContent(h2->GetNbinsX()+1) == h4->GetBinContent(h4->GetNbinsX()+1)
(bool) true
root [53] h1->GetBinContent(h1->GetNbinsX()+1) - h2->GetBinContent(h2->GetNbinsX()+1) == h3->GetBinContent(h3->GetNbinsX()+1)
(bool) true
root [54] h1->GetBinContent(0) + h2->GetBinContent(0) == h4->GetBinContent(0)
(bool) true
root [55] h1->GetBinContent(0) - h2->GetBinContent(0) == h3->GetBinContent(0)
(bool) true

Which indicates that the only difference is indeed that subtracted and added histograms have their numbers of entries calculated differently. Why is this the case? It implies that I need to know how a given histogram was created to use it properly.

This really feels like a bug.

couet · October 17, 2018, 8:38am

May be @moneta can give details.

system · October 31, 2018, 8:51am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.