I guess I am asking very stupid question, but I have to understand it.
I print NLL from RooFit as r→minNll, where r is from RooFitResult* r = model.FitTo(….
I get negative value, for example for S+B fit I got -11254.9 and for B only fit I got -11250.2.
I say then that S+B fit is better than B fit. But why ? The smaller NLL the smaller probability of describing data. So, B fit is better describes data than S+B… The Log likelihood function is always negative, right ? since p_i for each event/bin is less than 1.
Apart from that, I do not fully understand your statement
The smaller NLL the smaller probability of describing data. So, B fit is better describes data than S+B
I do not know if the dataset you studied is contained or not signal, but in the case described above the S+B value is smaller than the B value (-11254.9 VS -11250.2).
I hope attached plot with S+B and B only fit of Run II data
will clarify my question.
Solid line is S+B fit, dashed line is B only fit. It is unbinned fit.
One can see that S+B fit better describes data than B only fit. However NLL(S+B)=-11254.9 is smaller than
NLL(B)=-11250.2. In my understanding it should be other way around…
So your smallest NLL corresponds to the highest likelihood.
I would recommend you to look up how maximum likelihood fits work on Wikipedia or in a statistics book of your choice for more information.
Log Likelihood in my case of unbinned fit over m_mumu is a sum of Ln (p_i), where p_i is a probability for event I to have a given m_mumu.
Since p_i < 1.0, Ln(p_i) < 0.0. Therefore sum of Logarithms is negative at the beginning. Therefore the smaller the sum the less compatibity
with data. -10 is worse than -5.