RooStats Significance() function operation

Lepton86 · December 29, 2020, 11:25am

Dear Experts,

I’m doing some analysis in RooFit where I have some model that models the signal and background, and I fit for the signal yield.

To calculate the significance of my results I’ve basically repurposed this example but for my own reasons.

My question relates to the Significance() function. When you call the Significance() function, how does the function actually perform the calculation? Does it calculate the significance by equation 39.13 on page 4 in this PDG statistics summary or something else but the equivalent of this?

Thank you.

jalopezg · December 29, 2020, 2:30pm

Hello @Lepton86,

Referring to the implementation of the Significance() and RooStats::PValueToSignificance() functions, i.e.

virtual Double_t Significance() const {return RooStats::PValueToSignificance( NullPValue() ); }

inline Double_t PValueToSignificance(Double_t pvalue){
   return ::ROOT::Math::normal_quantile_c(pvalue,1); 
}

Therefore, the significance is computed as the inverse of the cumulative distribution function of a normal distribution with mean equal to the P-value for the null-hypothesis and standard deviation of 1.

Cheers,
J.

Lepton86 · December 29, 2020, 8:44pm

Hi @jalopezg,

Thanks you for the reply.

The example I linked to uses the ProfileLikelihoodCalculator. It uses the profile likelihood ratio that is determined from fitting with and without the signal yield fixed to zero. The p-value is then obtained from the likelihood ratio and converted to a significance. Is this correct? If yes, wouldn’t the calculation \ln L(0) - \ln L_{max} = -s^{2}/2 give the same result?

I also can’t find where in the code the p-value is actually calculated from the likelihood ratio. The Significance() function basically takes NullPValue() as an input to PValueToSignificance but it’s not immediately clear what calculation is being done.

Thanks.

jalopezg · December 30, 2020, 4:15pm

Hi @Lepton86,

I am inviting @moneta to this thread. He will be able to provide a more concise answer to your first question. In regards to the p-value computation, refer to this source file, line 390.

Cheers,
J.

Lepton86 · December 30, 2020, 8:21pm

Thank you for the link. It answers part of the question.

I will wait for @moneta for the rest, like you suggest.

moneta · January 5, 2021, 2:16pm

Hi,

Yes if you want the significance in a problem with only one parameter of interest (e.g. the signal yield), then the asymptotic value using the profile likelihood is just s = sqrt( 2 * DeltaLogLikelihood) where 'DeltaLogLokelihood` is the difference in the negative log-likelihood values between the null value and the nll minimum.

Cheers

Lorenzo

Lepton86 · January 5, 2021, 7:37pm

Hi @moneta,

Thank you. But does does the example given here https://root.cern.ch/doc/v606/rs102__hypotestwithshapes_8C_source.html#l00175 give the same value? If not, is there a tool in RooStats that does?

Also, I tried to calculate s = sqrt( 2 * DeltaLogLikelihood) manually, as a check, but I’m not getting the correct value. I simply have some model where I do

sig_yield = ROOT.RooRealVar("sig_yield", "signal yield", start, LB, UB)
bkg_yield= ROOT.RooRealVar("bkg_yield", "background yield", start, LB, UB)

model = ROOT.RooAddPdf("model", "total model", ROOT.RooArgList(sig_model, bkg_model), ROOT.RooArgList(sig_yield, bkg_yield))

result = model.fitTo(data ROOT.RooFit.Minos(), ROOT.RooFit.Extended())

nll = fitResult.minNll()

zero_sig_yield = ROOT.RooRealVar("sig_yield", "signal yield", 0.0)

model = ROOT.RooAddPdf("model", "total model", ROOT.RooArgList(sig_model, bkg_model), ROOT.RooArgList(zero_sig_yield, bkg_yield))

result_with_zero_sig = model.fitTo(data ROOT.RooFit.Minos(), ROOT.RooFit.Extended())

zero_sig_nll = result_with_zero_sig.minNll()

Then I simply use nll and zero_sig_nll to calculate s = sqrt( 2 * DeltaLogLikelihood). But I keep getting incorrect results, ~1, for various data sets with different signal yields. Is my understanding of s = sqrt( 2 * DeltaLogLikelihood) or RooFit incorrect, or both?

Thank you.

moneta · January 6, 2021, 10:25am

Hello,
yes the example https://root.cern.ch/doc/v606/rs102__hypotestwithshapes_8C_source.html#l00175 , give the same results as = sqrt( 2 * DeltaLogLikelihood). You can check this manually by looking at the likelihood in the two fits:

unconditional fits : nll = 717.039
conditional fit (fixed mu=0) : nll = 723.97

therefore : s = sqrt(723.97-117.039) = 3.723` which is the reported result.

Your code seems correct, but you should make sure that both fit converges.

Cheers
Lorenzo

system · January 20, 2021, 10:25am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.