Normalized Histogram Fitting with Likelihood Wrong Error on Parameters

Giorgio_Del_Castello · September 10, 2024, 3:38pm

Hi,
I have a question that I thought it was simple but I cannot find the answer. The situation is the following:
I have an histogram hist that is not a counting histogram but is normalized by the exposure and binwidth. This means that for each bin i the height of the bin is h_i and the associated error e_i are:

h_i = c_i/norm_i
e_i = sqrt(h_i/norm_i + (h_i*err_norm_i/norm_i)^2)

where norm_i and err_norm_i are respectively the normalization and the normalization error evaluated for bin i. Once I have the normalized spectrum I want to fit a function (in this case just a pol0) to it in order to estimate the average behavior but since I have a low number of counts (most bins are empty and the density of the bins needs to be considered) I need to use the WL fitting option if I am correct. To my understanding this is done doing the following (I am using pyroot):

from array import array
import ROOT
import numpy as np

#supposing that bin_edges is the array with all the left and right edges of the bins
#supposing that the heights and height_errors are already evaluated

bin_edges = array('d',bin_edges)
rh=np.concatenate([[0],heights,[0]])
rh = array('d',rh)

h = ROOT.TH1D('h',"myhist",len(heights),bin_edges)
h.SetContent(rh)
[h.SetBinError(i+1,height_errors[i]) for i in range(len(heights)) ]

func = ROOT.TF1('MyFunc','pol0',1000,2000)

h.Fit(func,"QSWLR")

h.Draw()

npar = func.GetNpar()
parameters = np.array([func.GetParameter(pp) for pp in range(npar)])
errors = np.array([func.GetParError(pp) for pp in range(npar)])

Then at this point in parameters and errors I have the resulting values and errors from the fit. While the value is correct the error associated with it is a factor 10 less than the correct result (it can easily be evaluated by hand since there are only 5 counts in the fitting range). This mistake on the evaluation on the error disappears if I include 4 more counts by enlarging the fitting range.

Could you tell me what I am doing wrong ?

Best,
Giorgio

ROOT Version: 6.32.00
Platform: Ubuntu 11.4.0-1ubuntu1~22.04
Compiler: g++11.4.0

Danilo · September 10, 2024, 6:36pm

Hi Giorgio,

Could you please provide us a standalone reproducer, i.e. with the values that reproduce the behaviour you describe?

Best,
D

Giorgio_Del_Castello · September 11, 2024, 9:02am

Hi @Danilo ,
Yes sure, this is easy to produce a stand alone example. Here is all the necessary material:
Hist_Fit_Reproducer.zip (5.7 KB)

You will find a pickle file with the histogram and a python script which you can just execute in a shell (there are no arguments) that performs the fit in two different ranges: one that gives the wrong error and one that gives the right error.

The script will also produce plots and print on the terminal the results of fitting compared to the expected value and error. In the expected values the error on the normalization is not taken into account as well as few percent correction on the normalization factor (correction and error on the normalization are dependent on which bin is considered so for simplicity I neglected them) but both factors are subdominant so the fit result and fit error should match the expected result and error up to a few percent discrepancy.

I hope everything is clear, if not I will provide further details!

Thanks,
Giorgio

Giorgio_Del_Castello · September 19, 2024, 7:22am

Hi @Danilo,
Did you manage to have a look at this issue ? If any inputs from my side are required I will gladly send you more information.
Best,
Giorgio

Giorgio_Del_Castello · October 1, 2024, 9:32am

Hi @Danilo,
Did you manage to have a look at the issue ? I am still seeing it in my code.
Thanks,
Giorgio

StephanH · October 2, 2024, 7:01am

Hello,

@moneta will probably know the best way to deal with the errors in this case.

moneta · October 3, 2024, 2:39pm

Hi Giorgio,
I think what you see is due to the limited statistics you have in the fit and the WL case does not work correctly due to low statistics. This is just an approximation for computing the errors, and it is probably breaking for low statistics. Since you know the error distribution for each point, it is probably better you write the full likelihood using the correct pdf in every bin taking into account the normalisation.

Best regards

Lorenzo

Giorgio_Del_Castello · October 4, 2024, 8:45am

Hi,
Thanks for the feedback. I have never written my own likelihood, could you point me to a tutorial on how to perform such a fit ?
Thanks,
Giorgio

moneta · October 4, 2024, 9:06am

Hi,
An example of fittig providing a user defined likelihood/chi-square is in this tutorial:
https://root.cern.ch/doc/master/fitCircle_8C.html

Here a user defined chi-square is used and the ROOT::Math::Fitter class is then used for fitting. A similar thing can be done in your case.
If you need more help on this, please let me know

Best regards,

Lorenzo

Giorgio_Del_Castello · October 4, 2024, 12:11pm

Hi,
Thanks, I understood the tutorial but I am struggling to implement it in python. In particular when I do ok = fitter.FitFCN() it say : Template method resolution failed.
Could you explain how i should do it in python ?
I was using also NumericalMinimization.py as a guide.
Thanks,
Giorgio

moneta · October 4, 2024, 12:34pm

Hi,

Can you please post your code example in Python giving the error and I can investigate it later today. It is possible the PYthonization for the FItter class have some issues
Thanks,
Lorenzo

Giorgio_Del_Castello · October 4, 2024, 12:52pm

Hi,
thanks for the help. Yes sure, here is the script that performs the fit with all the values needed. What I am trying to do is to fit the normalization constant needed for the expected rate to match the histogram counts. This has some further nuisance parameters since the expected rate in itself has an uncertainty and the histogram is subject to an analysis efficiency that has an uncertainty as well.
For this reason in my likelihood the contribution given from each bin is the usual Poissonian term and two gaussian one for the rate uncertainty and one for the efficiency uncertainty.

From previous evaluations I expect the normalization factor to be around 2.5.

Best,
Giorgio

python_likelihood_fit.py (18.7 KB)

Giorgio_Del_Castello · October 4, 2024, 4:09pm

Hi,
I managed to make in work as a root macro, but I would like to achieve the pythonization as well.
I attach here the developed macro.
Fitting.C (10.3 KB)

Best,
Giorgio

moneta · October 4, 2024, 4:39pm

Hi,
Good it can work in C++. Thank you for the Python reproducer, I see the error and we will investigate it.
Best regards

Lorenzo

system · October 18, 2024, 4:39pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.