Under counting number of events in fits when using integral

izzy · March 30, 2022, 7:10pm

I am trying to calculate the number of events in my signal fit by following this thread: Integral Error in Signal

fileName = "/Volumes/Backup/CSV_DATA/RGK/6535MeV/R5893/MM2_EP_b_pFD.csv"
rdf = ROOT.RDF.MakeCsvDataFrame(fileName)
h1 = rdf.Histo1D(("MM2_EP_b", "MM2_EP_b", 100, 0.15, 0.475), "MM2_EP_b")
h1.GetYaxis().SetRangeUser(0,85000)

fg = ROOT.TF1("fg", "gaus(0)")
fbc = ROOT.TF1("fbc", "pol2(3)")
f1 = ROOT.TF1("f1", "gaus(0) + pol2(3)", 0.15, 0.475)

f1.SetParameters(17300, 0.3, 0.02, 47510, 73250, 105200)
r1 = h1.Fit(f1, "S")

c1 = r1.GetCovarianceMatrix()
c1g = c1.GetSub(0, 2, 0, 2)

fg.SetParameters(f1.GetParameters())
N_sig = fg.Integral(fg.GetParameter(1) -(3.0*fg.GetParameter(2)), fg.GetParameter(1)+(3.0 * fg.GetParameter(2)))/h1.GetBinWidth(1)
N_sigErr = fg.IntegralError(fg.GetParameter(1) -(3.0*fg.GetParameter(2)), fg.GetParameter(1)+(3.0 * fg.GetParameter(2)),fg.GetParameters(),c1g.GetMatrixArray())/h1.GetBinWidth(1)

fbc.SetParameters(f1.GetParameters())
N_bg = fbc.Integral(fg.GetParameter(1)-(3.0*fg.GetParameter(2)), fg.GetParameter(1)+(3.0 * fg.GetParameter(2)))/h1.GetBinWidth(1)
N_bgErr = fbc.IntegralError(fg.GetParameter(1) -(3.0*fg.GetParameter(2)), fg.GetParameter(1)+(3.0 * fg.GetParameter(2)),fbc.GetParameters(),c1.GetMatrixArray())/h1.GetBinWidth(1)

N_tot = f1.Integral(fg.GetParameter(1)-(3.0*fg.GetParameter(2)), fg.GetParameter(1)+(3.0 * fg.GetParameter(2)))/h1.GetBinWidth(1);
N_totErr = f1.IntegralError(fg.GetParameter(1)-(3.0*fg.GetParameter(2)), fg.GetParameter(1)+(3.0 * fg.GetParameter(2)),f1.GetParameters(),c1.GetMatrixArray())/h1.GetBinWidth(1)

fSig = ROOT.TF1("fSig","gaus", 0.15, 0.475);
for ipar in range(0,3):
    fSig.SetParameter(ipar,f1.GetParameter(ipar))
    fbc.SetParameter(ipar,f1.GetParameter(ipar+3))

When plotted I get the following

The values of the integral seem off to me, and as a sanity check I wanted to see if I could get a total pdf integral value similar to the value listed in Entries in the stats bar. I get a value that is an order of magnitude off.

print("tot: %.3e" % (f1.Integral(0.15, 0.475) / h1.GetBinWidth(1)))

tot: 6.293e+06
This would indicate to me that I am not getting the correct integral values. I believe there must be an error in how I am calculating the integral but I am new to pyROOT and am not sure what I could change in my code to get the correct value. Can anyone tell me what I’m doing wrong with my fits?

Data was too large to post so I have attached a sample in a zipped folder

ROOT Version: JupyROOT 6.24/06
Platform: macOS Monterey 12.2.1
Compiler: Jupyter Lab

sample.csv.zip (2.6 MB)

Wile_E_Coyote · March 30, 2022, 7:52pm

Check “h1.GetBinContent(0)” and “h1.GetBinContent(h1.GetNbinsX() + 1)”.

izzy · March 31, 2022, 2:13am

Thank you for the suggestion. I assumed that you meant to use h1.GetBinContent(0) and h1.GetBinContent(h1.GetNbinsX() + 1) to replace h1.GetBinWidth(1).

This gets me

print("tot: %.3e" % (f1.Integral(0.15, 0.475) / h1.GetBinContent(0) ))
print("tot: %.3e" % (f1.Integral(0.15, 0.475) / h1.GetBinContent(h1.GetNbinsX() + 1) ))

Which results in 3.671e-03 and 6.135e-04, respectively. I believe I may have misunderstood where I should try these two?

Wile_E_Coyote · March 31, 2022, 6:55am

The bin number “0” is the “underflow bin” and the bin number “nbins+1” is the “overflow bin”.
You need to take their contents into account when comparing your histogram’s “number of entries” with your “total” function’s integral (divided by the “bin width”, of course).

izzy · March 31, 2022, 4:58pm

Ah, I see. Including these values I was able to calculate total entries as 4.5197e+07 for my total pdf and the entries in the stats box is 4.519813e+07. Many thanks.

Just to clarify my understanding, the “underflow” and “overflow” bins are necessary because I am looking at the full range of the histogram to get the total entries, yes? If I were looking at the integral in a specific range, say mu +/- 3sigma, I would be inclined to think that I would not need the “underflow” and “overflow” bins because I am not looking at the full range of the histogram. Is that correct?

Wile_E_Coyote · April 1, 2022, 6:56am

The content of (any) histogram bins is irrelevant when calculating (any) function integral.

In the “first approximation”, you may want to compare (for a “1-D” fix bin size histogram):
function.Integral(x1, x2) / histogram.GetBinWidth(1)
with (“whole” bins are always used so, you’d better make sure that “xlow <= x1 <= x2 < xup”):
histogram.Integral(histogram.FindFixBin(x1), histogram.FindFixBin(x2))

system · April 15, 2022, 6:56am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.