I’m doing some analysis in RooFit where I have some model that models the signal and background, and I fit for the signal yield.
To calculate the significance of my results I’ve basically repurposed this example but for my own reasons.
My question relates to the Significance() function. When you call the Significance() function, how does the function actually perform the calculation? Does it calculate the significance by equation 39.13 on page 4 in this PDG statistics summary or something else but the equivalent of this?
Therefore, the significance is computed as the inverse of the cumulative distribution function of a normal distribution with mean equal to the P-value for the null-hypothesis and standard deviation of 1.
The example I linked to uses the ProfileLikelihoodCalculator. It uses the profile likelihood ratio that is determined from fitting with and without the signal yield fixed to zero. The p-value is then obtained from the likelihood ratio and converted to a significance. Is this correct? If yes, wouldn’t the calculation \ln L(0) - \ln L_{max} = -s^{2}/2 give the same result?
I also can’t find where in the code the p-value is actually calculated from the likelihood ratio. The Significance() function basically takes NullPValue() as an input to PValueToSignificance but it’s not immediately clear what calculation is being done.
I am inviting @moneta to this thread. He will be able to provide a more concise answer to your first question. In regards to the p-value computation, refer to this source file, line 390.
Yes if you want the significance in a problem with only one parameter of interest (e.g. the signal yield), then the asymptotic value using the profile likelihood is just s = sqrt( 2 * DeltaLogLikelihood) where 'DeltaLogLokelihood` is the difference in the negative log-likelihood values between the null value and the nll minimum.
Also, I tried to calculate s = sqrt( 2 * DeltaLogLikelihood) manually, as a check, but I’m not getting the correct value. I simply have some model where I do
Then I simply use nll and zero_sig_nll to calculate s = sqrt( 2 * DeltaLogLikelihood). But I keep getting incorrect results, ~1, for various data sets with different signal yields. Is my understanding of s = sqrt( 2 * DeltaLogLikelihood) or RooFit incorrect, or both?