Chi-square output when fitting with binned likelihood

cate · June 28, 2020, 8:30am

Continuing the discussion from Chi-square test for histogram fitted with Likelihood method:

Dear ROOT experts,
with reference to the discussion above, I would like to know how is calculated the Chi2 reported from ROOT at the end of the fit and if it should be taken into account to evaluate goodness of fit in case of binned likelihood fit.
The Chi2 I obtained with Baker Cousins formula is excellent (reduced chi2 = 1.032) but I cannot say the same for the one reported from ROOT (reduced chi2 = 1.538).

ROOT version : 5.34/36
Command used:
TFitResultPtr r3= myhisto->Fit(“f2”,“L,E,S,M”,“SAME”,0.2,230);
r3->Print(“V”);

Thank you very much for your support,
Kind Regards.
Caterina.

couet · June 29, 2020, 7:03am

I guess @moneta can help you.
Note that 5.34 is a very old ROOT version.

cate · June 29, 2020, 1:46pm

Hello,
thank you very much for your reply.
Is the request automatically addressed to @moneta or any action is required from my side?

Thank you very much again.
Kind Regards.
Caterina.

couet · June 29, 2020, 2:43pm

Yes it is.

moneta · June 29, 2020, 3:01pm

Hi,

When ROOT does a binned likelihood fit, you get , e.g. when printing the FitResult, see example below:

****************************************
Minimizer is Minuit / Migrad
MinFCN                    =      49.6579
Chi2                      =      116.804
NDf                       =           97
Edm                       =  1.28144e-07
NCalls                    =           59
Constant                  =      118.317   +/-   2.08391     
Mean                      =   0.00831242   +/-   0.0145695   
Sigma                     =       1.0147   +/-   0.010981     	 (limited)

You have the Chi2 which is the Neyman Chi2 and MinFCN which is the minimum of the likelihood function. The minimum of the binned likelihood fit is = (Baker-Cousins chi2)/2, so in this above example the Baker chi2 = 2 * 49.6579

Lorenzo

cate · June 29, 2020, 10:02pm

Hi Lorenzo,

Thank you very much for your prompt reply. It is clear now.

In this case the reduced Neyman Chi2 ( Chi2/NDF=1.204 >1 in the example) is just telling that the distribution in each bin (Poisson like) cannot be approximated with a Gaussian. So to evaluate the goodness-of-fit, it is more appropriate the use of reduced Baker-Cousins Chi2 (1.024 ~=1 ,in the example). Is this correct?

I have some set of (simulated) data for which I obtained exactly this behaviour.

Now I have a related question:

I have other similar set of data where the results are not the expected one.

In these cases the percentage of empty bins in these (always simulated) data is around 30%.

Performing the Baker-Cousins Chi2 as 2* MinFCN and dividing for NDf (calculated including also all the zero value-entries of the histogram in the number of entries)

I obtain a reduced Baker-Cousins Chi2 1.14 >1 .

If instead I calculate the reduced Neyman Chi2 , dividing for NDf (not including this time in the calculation all the zero value-entries of the histogram) I obtain

a value of 1.07.

I used for example the command :

TFitResultPtr r47= run47->Fit(“f2”,“L,E,S,M”,“SAME”,0.2,231);

(I tried also without option E,M but I get the same results for MinFCN, Chi2 and parameters)

I know that I didn’t provide many details, but can we exclude that the problem of high reduced Baker-Cousins Chi2 is due to the high number of empty bins and the associated error ?

I thank you very much again for your support.

Kind Regards.

Caterina.

moneta · June 30, 2020, 8:56am

Hi,

The problem with the Chi-square is when having empty bins or bins with few entries. The Neyman Chi2 assumes a gaussian distribution in each bins, and this approximation breaks quite strongly when the expected number of events per bin is < 5.
The Baker-Cousins chi2 since is based on a Poisson likelihood ratio works much better, however the approximation that follows a chi2 distribution is not valid anymore when the expected number of events is small. There is a nice old note from J. Heinrich (see the first Figure in his note) about this, showing that when the expected events is between 1 and 5 the expected value is larger than 1 as you observe.
However it drops very quickly for values < 1, means you will underestimate a lot the chi2 for the empty bins.

You can find this note here: https://www-cdf.fnal.gov/physics/statistics/notes/cdf5718_loglikeratv2.ps.gz

The best way to get a reasonable p-value, to see if the fitted function is compatible with your data, in the case of empty bins will be then to re-calibrate the obtained Baker-Cousins chi2 using pseudo-experiments.

Lorenzo

cate · July 3, 2020, 5:52pm

Hi Lorenzo,

Thank you very much for the explanations and interesting note.

So could you please confirm if I understood correctly?

In my first case where I obtained a Neyman reduced Chi-Square =1.5 and Baker-Cousins Chi2=1.03, and I don’t have empty bins but high % of bins with value around 5 , I can probably still use the Baker-Cousins Chi2 to evaluate the goodness of fit.

In the second case with empty bins: the empty bins give me a very low value of Baker-Cousins Chi2 for those bins, and at the same time, all the bins with value between 1 and 4 (I have high % of bins with this charachteristics) cause a resulting total value of Baker Cousins Chi2 >1.

Therefore in this case I cannot use anymore Baker-Cousins Chi2 to evaluate the goodness of fit.

Is my understanding correct?

Could you please finally suggest a reference explaining how to recalibrate the Baker-Cousin Chi2 with pseudo-experiments?

I thank you very much again for your support.

Kind Regards.

Caterina.

moneta · July 6, 2020, 12:30pm

Hi,

I think you could still use it, but you might need to calibrate the obtained p-value using pseudo-experiments.
For doing this you generate a number N of toys experiments (e…g. N=1000) , where each toys consists of an histogram with the same number of bins as your data histogram and with data obtained from your fit function (e…g using TH1::FillRandom). From the distribution of the obtained chi2 values and your specific value obtained with the original data you obtain you can compute then the corrected p-value.
If you need I can provide you an example

Best regards

Lorenzo

cate · July 8, 2020, 5:58am

Hi Lorenzo,
Thank you very much for the explanations.
Yes please, it would be great if you could provide me an example.
Best Regards.
Caterina

-------- Message original --------

cate · July 20, 2020, 8:06pm

Hello Lorenzo,
I am so sorry I didn’t receive the example on this subject. Could you please send it to me again?

Thank you very much.
Kind Regards.
Caterina.

moneta · July 22, 2020, 4:14pm

Hi,

Here is an example where you compute the p-value using toys.
Sorry for the delay

Lorenzo

Chi2test_Example_toys.C (3.7 KB)

system · August 5, 2020, 4:14pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.