Pearson, Neyman and Baker-Cousins Chisquare

Hello,
we noticed that, when computing the Chi2 between a TF1 and a histogram, the Pearson and Neyman chisquares coincide. Moreover all the chisquares seem to be computed with the value of the function at the center of the bin (whereas the integral divided by the bin width should be used).
Here’s the comparison (using the function at the center of the bin):

Chi2(Pearson)        7.9764774712531405
Chi2(Neyman)         7.9764774712531405
Chi2(Baker-Cousins)  5.599308126733436
Chi2(Pearson)       [computed from scratch]  5.470102228210528
Chi2(Neyman)        [computed from scratch]  7.9764774712531405
Chi2(Baker-Cousins) [computed from scratch]  5.5993081267334395

We used ROOT 6.28/06 with Apple clang 15.0.0

Thanks a lot in advance,

Roberta Cardinale, Fabrizio Parodi

PS. Since new users can’t put links we add the python code and data file in the post body

------- pyROOT code -------

from   ROOT  import *
import numpy as np

h2 = TH1D("h2","",10,0,0.2)
x2 = np.loadtxt("s2.txt")
for i in range(len(x2)):
    h2.Fill(x2[i])

f = TF1("exp","[0]/[1]*exp(-x/[1])",0,0.2);
f.SetParameter(0,h2.GetEntries()*h2.GetBinWidth(1))
f.SetParameter(1,0.105)

chi2N  = h2.Chisquare(f,"N");
chi2P  = h2.Chisquare(f,"P");
chi2BC = h2.Chisquare(f,"L");
print("Chi2(Pearson)       ",chi2P)
print("Chi2(Neyman)        ",chi2N)
print("Chi2(Baker-Cousins) ",chi2BC)

mchi2P=0
mchi2N=0
mchi2BC = 0
for i in range(h2.GetNbinsX()):
    xmin = h2.GetBinCenter(i+1)-h2.GetBinWidth(1)/2
    xmax = h2.GetBinCenter(i+1)+h2.GetBinWidth(1)/2
'''
    fexp = f.Integral(xmin,xmax)/h2.GetBinWidth(1)
'''
    fexp = f((xmax+xmin)/2)
    N = h2.GetBinContent(i+1)
    if N!=0:
        mchi2N += (N-fexp)**2/N
    mchi2P += (N-fexp)**2/fexp
    
    mchi2BC += -2*(-fexp + N - N*(log(N/fexp)))

print("Chi2(Pearson)       [computed from scratch] ",mchi2P)
print("Chi2(Neyman)        [computed from scratch] ",mchi2N)
print("Chi2(Baker-Cousins) [computed from scratch] ",mchi2BC)

— data file used ----

2.5828444588394796e-05

0.18145581427987648

0.1263659805433976

0.005424387252062355

0.14624994234158106

0.07236607929535993

0.004345362449683603

0.0295303920904529

0.06161052939065797

0.030116863335391143

0.027451079819355365

0.041758331622220105

0.11531604906417892

0.021765487331107784

0.06545572141691491

0.17803898180619235

0.07433258746190237

0.09356927547189414

0.15065759232044093

0.15455693393438372

0.349544638577825

0.10979947298734308

0.16391301428464833

0.005792917824559513

0.054484498412083686

0.010724578301441518

0.04071205804629209

0.06959304306278759

0.05786972361949642

0.17021858273758383

0.12156219571689578

0.21420922812052653

0.27659282992034084

0.04336708133006152

0.03210065099424145

0.04507797446633408

0.033703330666773376

0.23068316649368004

0.03577223548017721

0.2227361951639177

Hi Roberta, Fabrizio,

Thanks for posting and welcome to the ROOT Community!
First of all I have an observation: are you sure you want to fit the function [0]/[1]*exp(-x/[1])? What is the ration of the two parameters bringing to you? Would it be perhaps better to use a single parameter for the normalisation?

I put in the loop the experts, @moneta @jonas , but could you formulate the exact question for us (I understand that you would like to know why the Pearson and Neyman Chi2 values numerically coincide and why are the Chi2 calculated with the value of the function evaluated at the center of the bin while another estimator should be used - but I might be wrong, please correct me if that’s the case).

Cheers,
Danilo

Hi Danilo,
thanks for the prompt answer !

The two parameters are: [0] = normalization, [1] = slope. This choice allows to interpret the fit function (even if the fit is not performed in this example) as pdf*normalization_factor. One can of course re-write the function with one single multiplying parameter but the problem remains.

Yes, your interpretation is correct. The two questions are:
- why Pearson and Neyman Chi2 are identical ?
- why the function value at the center of the bin is used instead of the integral across the bin divided by the bin width (which should be the correct estimate of the number of expected events in the bin) ?

Cheers,
    Fabrizio

Ok, Thanks Fabrizio. I am sure @moneta and @jonas can address your 2 questions.

Cheers,
D

Hello,

There was a bug in the ROOT version 6.28 which prevented the computation of the Pearson chi-square. If you are using the new ROOT version 6.30 you will get the correct result for the Pearson chi-square.
It is true the function TH1::Chisquare compute it using the bin center. When fitting, we have the option “I” to compute the chi-square using the integral of the bin. It is an option we could include also in the TH1::Chisquare function. I will do it for the next release.
As a workaround you can do a dummy fit (fixing the function parameters). This will compute automatically the correct chi-square. I attached your macro with this workaround as example.

Best regards

Lorenzo
exampleChisquare.py (1.3 KB)

Hi Lorenzo,
thanks a lot for your reply !
Yes, adding the “I” option to TH1::Chisquare would be, indeed, useful (or even setting the default to it, since it is, in principle, the correct result). But we agree that in almost all the real cases it makes little difference (so you can put the change in low priority).

Best regards,
   
    Fabrizio, Roberta

@moneta Your macro returns an improper value with ROOT 6.26 and 6.28:

Chi2(Baker-Cousins) integral  7.9976562924151

@Wile_E_Coyote : in older ROOT version TF1::GetChisquare returns the Neyman chi-square when doing a likelihood fit. In this case you should use for Baker-Cousins : 2 * FitResult::MinFCNValue() :

result = h2.Fit(f,"L I S")
print("Chi2(Baker-Cousins) integral ",2. * result.MinFcnValue())

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.