Strangely biased fit

TShirt · December 19, 2011, 5:13pm

Hi,
I’ve written a code (attached) to compare the results of two different fits of the same histogram: with and without the “L” option. To do that I generate a random histo with an exponential trend, I fit it in the whole range and then I refit it again adding the option “L”; I repeat this procedure several times storing the results in some other plots in order to get their distribution and point out any discrepancy.
I’ve written a normalized pdf for the exponential (with only one parameter: the slope “m”) and it’s projection on a binned histo (with an additional parameter: the number of candidates “B”).

I get no errors running it (and all the fits converge) but looking at the comparison plots I see that the ones related to the fits without the “L” option are biased. The strangest thing is that the bias is related to the number of bins used for the candidates plot (300 is my default); if I double them, the bias is doubled too!

I’ve probably added a bug in my code; I’ve tried to locate it but it’s not so obvious to me where the problem is.

Could anyone help me to find it?

Thanks in advance for any suggestion!

Bye…
CompareFitsSimple.C (4.25 KB)

Pepe_Le_Pew · December 19, 2011, 5:47pm

For the “chisquare method”, set all weights to 1 for non empty bins (i.e. ignore error bars):
genEv->Fit(dummyF, “RW”, “PE”, 1.2, 2.4);
(An interesting question is: can one somehow convince the “loglikelihood method” to reproduce the “chisquare method” without the “W” flag?)

TShirt · December 20, 2011, 4:41pm

Thanks a lot!

Bye…

TShirt · December 28, 2011, 5:22pm

There’s only one thing.
If I use the “W” option for the chi2 method the uncertainty of the parameter does not make sense.
Without the “W” the uncertainty makes sense but the bias is always present.

If I use a gaussian shape I get (par[0] is the normalization):
with “R” 1 p0 3.88621e+03 6.24172e+01 3.39976e-01 3.00957e-11
with “WR” 1 p0 4.01286e+03 8.42010e+00 2.37006e-01 -2.30246e-10
with “WRL” p0 = 3999.97 +/- 63.2452

I generate 4000 candidates so I expect an error ~ sqrt(4000) = 63

Any idea on what’s the reason of the bias? Am I writing something wrong?

Thanks.

Bye…

moneta · January 6, 2012, 9:27am

Hi,

What you observe is expected and it is well known. Fitting Poisson data using a least square method (chi2) is biased since it assumes a gaussian error for each bin. The bias will be reduced in case of higher statistics for each bin (lower number of bins) as you observe. There is also another problem with chi2 fits, due to the fact that the bin error is estimated from the observed events. This other bias affects only the normalization parameter and it is negligible if the number of entries is much larger than the number of bins.

Using option “W” is not a solution since it does not use the bin error. I would reccomend to always use a maximum likelihood fit, option “L”, for Poisson data and in particular when you have bins with a small content (let’s say < 10). Furthermore, bins with zero entries are correctly treated in the likelihood fit, while they are simply ignored in case of a chi2 fit.

Best Regards

Lorenzo

TShirt · January 6, 2012, 12:09pm

Thanks a lot Lorenzo!
I’ll follow your suggestion!

Bye…