Fit result reliability with high numInvalidNLL or covariance matrix quality different from 3

treso · March 10, 2017, 12:58pm

Hi rooters!
I’m working with RooFit on an extended unbinned simultaneous maximum likelihood fit on three bins and I’m running some montecarlo toys to perform some statistical studies.
I have two signal pdfs with fixed parameters, the only free signal parameters are the number of events in each bin. A gaussian constraint is applied to this parameters.
There are alse several background pdfs, with constraints applied to them.
I found some unexpected behaviours when I generate and fit toys with few signal events (or 0).
The fit output seems to fluctuate more towards the unphysical region (negative number of fitted signal events) and sometimes I get a negative fitted number of signal events with a high absolute value (e.g. -100), but the fit seems to converge perfectly and also the covariance matrix quality is = 3.
I noticed that in these cases the number of invalid NLL (retrieved with the numInvalidNLL method) tends to be quite high ( > 50, while usually it’s below 5); it seems that the fit ends up in an unphysical region and can’t get out of it.
Sometimes the covariance matrix quality is not 3 but lower and also in this case the fitted number of signal events ends up far in the unphysical region.

I attached a plot that shows the distribution of the total number of fitted signal events for the two signals (obtained summing the output from the three bins) when I generate 10000 toys with 0 signal events. As you can see there is a bulk centred in (0,0) (as expected) but also a lot of fluctuations in the unphysical region.

I was wondering how reliable the fit results can be when I get a perfect convergence but a high numInvalidNLL and when I get a covariance matrix quality different from 3.

Thank you for your help.

Cheers,

Fabio

moneta · March 13, 2017, 3:39pm

Hi,

I think in this case it is safer to perform a NLL scan around the minimum. If you see that the NLL has a reasonable shape (close to parabolic) and you are getting good NLL values then it should be fine.
If close to the minimum there are many invalid NLL then it could be problematic

Lorenzo