I’ve been seeing a bias in the fitted mean of the distribution when doing a maximum likelihood fit to toy datasets, generated from a PDF with asymmetric tails.
I’ve attached the code I’m using, demonstrated on a Crystal Ball (CB) function.
To summarise the operation of the code:
Create a CB model
Generate a toy dataset of 200 events from this model (extended)
Fit the CB back to this toy dataset, with only the mu parameter (peak of the distribution) free
Calculate the pull of the mu (i.e. (fittedMu - trueMu) / muErr)
Repeat 1e6 times
Ideally, the pull distribution should be Gaussian and centred at 0. However, what I’ve been seeing is that the pull distribution is not centred at 0 when the PDF of choice is asymmetric and tends to be shifted in the direction of the longer tail. The effect is larger the smaller the number of events.
Has anyone ever seen this before? Does anyone know how to fix it?
The things I’ve tried so far are:
Using minos to calculate errors rather than hesse
Reducing eps (I have tried 1e-8 and 1e-10)
Using minuit rather than minuit2
Checking all fits converge. Most do, but around 0.01% of fits return a status of 4, rather than 0
this is not a binned it, is it? There is a known bias in binned fits when the number of bins is low.
Does the bias vanish when you have a decent number of events, or is it always observable? If it’s always observable, something might be wrong with the bare function or the integrals of the CB. That’s something we could test.
No it’s an unbinned fit.
I’ve attached a plot showing the bias in the fitting of mu vs number of events for a CB with parameters mu = 0, sigma = 1.5, alpha = 1, n = 1.2.
Note that the only free parameter in the fitting is mu, all others are fixed. You’ll see that up to 1000 events there is still a bias. I’m aware that a maximum likelihood estimator should have a bias proportional to 1/N, hence the fit. But as you can see, it doesn’t seem to follow this pattern.
I don’t think that this problem is specific to the CB. I’ve also observed a bias in three other asymmetric functions, namely: Novosibirsk, Landau and Double-tailed CB. The first two I have only checked at 100 events, but the latter I have checked more thoroughly and it shows a similar pattern to the CB.