# CLs upper limits, without RooFit, RooStats packages

Hi everyone,

I need some help with the CLs method, used for setting upper limits. I have some background histograms, signal histograms (beyond SM models) and data histograms. I am trying to set upper limits to my signal in the context of my MSc thesis.

However, my supervising professor has advised me to write my own code, and not use the RooFit/RooStats packages. I am struggling with understanding the whole statistical procedure of setting observed and expected limits, as well as with writing code that implements it.

Any help will be much appreciated!

Hi,

You can setup all the procedure without RooFit/RooStats, but it will require you a lots of time, especially if your model is not a simple one (e.g. a counting experiment).
I think it is better using the tools, but at the same time try to understand them and not using as a black box

Lorenzo

Hi Lorenzo,

First of all, thanks for you reply!

Actually, in my previous post I didnât specify the fact that I have managed to create some things from scratch without using any specific tool (e.g. binned likelihood function, CRs, SRs, profile likelihood function, q_mu test statistic), so I have some code already.

I have read many papers and tutorials on the CLs method, however, I have some practical questions regarding the procedure. For example, how does one calculate the observed value of the test statistic (q_obs) for the signal + background, and background only hypothesis, how exactly to use the Asimov dataset in order to extract the expected limits, etc.

Any clarification will be more than welcome!

Hi Panagiotis,

If you have already researched papers and tutorials on CLs, you might have already encountered the following paper:

Following this paper you can find the p-value for any possible limit \mu from formulas (59) or (67).
To find upper limits at a given p-value, you can then calculate this p-value for âeveryâ \mu and scan for one that fits your threshold.
Alternatively the upper limits can be computed from (61) and (69) depending on your test statistic.

I have written the following code that implements the test statistic \tilde{q}_{\mu} to exclude the signal (the hypothesis \mu = 1).
By changing the test_mu variable in lines 106 and 107 you can vary the \mu to be excluded.
The code is not written completely without the use of RooFit and RooStats but computes many quantities without relying on them.

The workspace containing my data and pdf is read from a json file into ROOT (this only works on the master branch as of now and not yet on the stable releases).
If you already have your likelihoods and test statistic without using RooFit, you should be able to reuse some parts of my code that donât require RooFit.

I hope you can get all your answers from my explanation and the code. Otherwise feel free to ask further questions.

Cheers
Daniel

The PyROOT Code
def read_workspace(config):
import ROOT

# start by creating an empty workspace
ws = ROOT.RooWorkspace("workspace")

# the RooJSONFactoryWSTool is responsible for importing and exporting things to and from your workspace
tool = ROOT.RooJSONFactoryWSTool(ws)

# use it to import the information from your JSON file
tool.importJSON(config)
#ws.Print()

return ws

def get_nll_mu(pdf, data, poi, poi_value):
"""
Fit the pdf to data to get the NLL value and the best-fit value of the parameter of interest.
The parameter of interest (poi) is either floating and will be optimized to the best estimate,
or if poi_value is not None, it will be fixed to this value.
"""
import ROOT

params = pdf.getParameters(data)
init_params = params.snapshot()

was_constant = poi.isConstant()
old_value = poi.getVal()

if poi_value is not None:
poi.setConstant(True)
poi.setVal(poi_value)

nll = pdf.createNLL(data)  # Get the negative log likelihood of finding
#  the given data from the pdf
# Check for floating parameters
has_floating_params = False
for param in params:
if not param.isConstant():
has_floating_params = True

# No fit is necessary if all parameters are constant
if has_floating_params == False:
val = nll.getVal()    # The likelihood found for poi_value
mu_hat = poi_value    # The resulting strength parameter mu is identical as no fit was performed
poi.setConstant(was_constant)  # Reset all properties of the parameters
poi.setVal(old_value)          #  to not affect future fits if this
return val, mu_hat             #  function is called multiple times

minim = ROOT.RooMinimizer(nll)  # Perform a fit of the nll
minim.setPrintLevel(-1)

minim.minimize("Minuit", "")  # The minimizer "minuit" is used here

val = nll.getVal()     # Export the nll value of the best-fit parameters
mu_hat = poi.getVal()  # Export the best-fit value of mu

poi.setConstant(was_constant)
poi.setVal(old_value)

params.assign(init_params)

return val, mu_hat

def get_q_tilde(pdf, data, mu):
"""
Formula (16) from the paper https://arxiv.org/pdf/1007.1727.pdf is used as the
definition of our test statistic q_mu_tilde.
"""
nll_val_float, mu_hat = get_nll_mu(pdf, data, mu, None)    # Get the likelihood from an unconstrained fit
nll_val_fixed, _ = get_nll_mu(pdf, data, mu, mu.getVal())  # Get the likelihood with a fixed poi

q_tilde = 2 * (nll_val_fixed - nll_val_float)  # Calculate q_tilde

if mu_hat <= mu.getVal():  # Following (16), 0 is returned if the fit yields a negative signal strength
return q_tilde
return 0

def get_F(q_tilde, test_mu, sigma, mu_prime):
"""
Formula (65) from the paper https://arxiv.org/pdf/1007.1727.pdf defines the cumulative probability
density function of q_tilde. This in turned will be needed to compute the exclusion limits.
The quantity sigma describes the width of the distribution of q_tilde and is found from equation (29)
"""
from scipy.stats import norm

phi = norm.cdf  # Phi is the cumulative density of the normal distribution

if q_tilde <= test_mu**2 / sigma**2:
return phi(q_tilde ** (0.5) - (test_mu - mu_prime) / sigma)
else:
return phi((q_tilde - (test_mu**2 - 2 * test_mu * mu_prime) / sigma**2) / (2 * test_mu / sigma))

def get_CLs(pdf, data, mu, sigma):
"""
Get the CLs, which is p_{s+b}/(1 â p_b).
"""
q_tilde = get_q_tilde(pdf, data, mu)

# Calculate the CL values for the hypotheses signal and background and background-only.
# In both tests the null-hypothesis to be rejected is mu = 1
clsb = 1.0 - get_F(q_tilde, 1.0, sigma, 1.0)  # In the case of s and b the real strength is 1
clb = 1.0 - get_F(q_tilde, 1.0, sigma, 0.0)  # In the b-only case the real strength is 0

return clsb / clb

def post_fit_asimov(observables, pdf, data, poi, poi_value):
from ROOT.RooStats import AsymptoticCalculator

params = pdf.getParameters(data)
init_params = params.snapshot()

was_constant = poi.isConstant()
old_value = poi.getVal()

poi.setConstant(True)
poi.setVal(poi_value)

pdf.fitTo(data, PrintLevel = -1)
asimov_data = AsymptoticCalculator.GenerateAsimovData(pdf, observables)

poi.setConstant(was_constant)
poi.setVal(old_value)
params.assign(init_params)

return asimov_data

def hypotest(obs_data, pdf, poi, observables):
"""
Find the asimov data and perform the hypothesis test itself to find the CLs values
for the observed data and the expected values.
"""
# Generate the asimov data from the given pdf. A fit is performed with fixed mu to find best-fit values
#  for the other parameter. If this fit is not performed, it is called having nominal asimov data.
asimov_data = post_fit_asimov(observables, pdf, obs_data, poi, 0)

# Compute the test statistic for the asimov dataset under the null hypothesis to find the width
#  of the distributions according to (29).
q_tilde_asimov = get_q_tilde(pdf, asimov_data, poi)
sigma = (poi.getVal()**2 / q_tilde_asimov) ** 0.5

# Compute the confidence levels of excluding mu = 1 based on the observed or asimov data
CLs_obs = get_CLs(pdf, obs_data, poi, sigma)
CLs_exp = get_CLs(pdf, asimov_data, poi, sigma)

return CLs_obs, CLs_exp

def main():

model = ws["ModelConfig"]
pdf = model.GetPdf()
mu = model.GetParametersOfInterest()[0]
mu.setRange(0, 10)
observables = model.GetObservables()
obs_data = ws.data("obsData")

CLs_obs, CLs_exp = hypotest(obs_data, pdf, mu, observables)
print(f"Observed: {CLs_obs:.8f}, Expected: {CLs_exp:.8f}")

if __name__ == "__main__":
import ROOT

# As long as this object lives, RooFits output below WARNING level is silenced
changeMsgLvl = ROOT.RooHelpers.LocalChangeMsgLevel(ROOT.RooFit.WARNING)

main()

The JSON file containing my workspace
{
"distributions":[
{
"name":"channel_1",
"type":"histfactory_dist",
"axes":[
{
"name":"obs_x_channel_1",
"max":3,
"min":0,
"nbins":3
}
],
"samples":[
{
"name":"signal",
"data":{
"contents":[6.369616873214543,2.697867137638703,0.4097352393619469]
},
"modifiers":[
{
"name":"mu",
"type":"normfactor",
"parameter":"mu"
}
]
},
{
"name":"background",
"data":{
"contents":[0.16527635528529094,8.132702392002724,9.127555772777217],
"errors":[0.5125680914002991,3.3919644089992107,2.5959617012357628]
},
"modifiers":[
{
"name":"bkg_uncert",
"type":"staterror",
"constraint":"Poisson"
}
]
}
]
}
],
"data":[
{
"name":"obsData_channel_1",
"type":"binned",
"axes":[
{
"name":"obs_x_channel_1",
"max":3,
"min":0,
"nbins":3
}
],
"contents":[1,2,7]
}
],
"likelihoods":[
{
"name":"simPdf_obsData",
"distributions":["channel_1"],
"data":["obsData_channel_1"]
}
],
"analyses":[
{
"name":"simPdf_obsData",
"likelihood":"simPdf_obsData",
"parameters_of_interest":["mu"]
}
],
"misc":{
"ROOT_internal":{
"ModelConfigs":{
"simPdf_obsData":{
"mcName":"ModelConfig",
"pdfName":"simPdf"
}
},
"combined_datas":{
"obsData":{
"index_cat":"channelCat",
"indices":[0],
"labels":["channel_1"]
}
},
"combined_distributions":{
"simPdf":{
"distributions":["channel_1"],
"index_cat":"channelCat",
"indices":[0],
"labels":["channel_1"]
}
}
}
}
}


Hi Daniel,

Actually, I do have some questions I want to ask regarding the code you uploaded.

If I am not mistaken, your function named get_nll_mu is used to extract the maximum likelihood mu, as well as the value of the likelihood function for the maximum likelihood mu, based on the observed data.

Also, the function get_q_tilde is used to compute the value of the test statistic, by subtracting the negative log likelihood value for a specific value of mu (which has to be the tested value of mu) from the negative log likelihood value for the maximum likelihood mu, which in principle is the definition of the q_tilde according to (16).

Then, get_F defines the cumulative distribution function which will be used for the upper limit calculation. Here is my first question. Inside the paper you have posted, it is mentioned that mu_prime is the mean value around which the distribution of maximum likelihood mu is centered, and sigma is the standard deviation of this distribution. When I try to evaluate these quantities, I run pseudoexperiments, constructing the pdf of the maximum likelihood estimator, and I define the mu_prime as the mean of this distribution, and sigma as its standard deviation. Is this the correct way? I am not sure if I understand how to calculate sigma from (29).

Next, the code proceeds with defining get_cls. My second question is, when calculating clsb and clb, why is test mu fixed to 1 for both the signal+background and background hypothesis? Wasnât it supposed to be equal to 1 for the signal+background hypothesis and 0 for the background only hypothesis?

I really apologize for the long text. I fell really puzzled.

Cheers,
Panagiotis

Hi Panagiotis,

First of all your understanding of get_nll_mu and get_q_tilde is correct.

Concerning \mu' and \sigma: In my approach I use my knowledge about the asimov dataset to find \sigma and I assume \mu' to be either 0 or 1 according to which model I am assuming (see the next paragraph for more details). The Asimov dataset is constructed to yield the strength parameter of your null hypothesis as the expectation value of \mu. Therefore, in the case of the Asimov dataset \hat{\mu} = \mu' must be valid. In the Wald approximation (17) one can therefore replace \hat{\mu} with \mu', which yields (29). The left hand side of this equation is just \tilde{q}, which can be easily evaluated. This is used in (30) to be able to compute \sigma. In my code this is done in the hypotest function where I compute first \tilde{q}_A and then \sigma.
Unfortunately I havenât worked on hypothesis testing with toys yet. This means that I cannot tell you whether your approach is correct. You can, however, try both ways of finding \mu' and \sigma and compare the results as they should yield the same (or very similar) results if your approach is correct. The \mu' you find should then depend on the model you assumed when performing your pseudoexperiments.

What we do when calculating \text{CL}_{sb} and \text{CL}_{b} is the following:

• \text{CL}_{sb} is the p-value of finding our \tilde{q}_{\mu} if we the signal + background hypothesis was true. Therefore we assume \mu' = 1 (sb-model) and we want to exclude the hypothesis \mu = 1 with the according \tilde{q}_{1}
• \text{CL}_{b} is, similarly, the p-value of finding \tilde{q}_\mu assuming the background-only model. Thus, we change \mu' to 0 but still have our tested hypothesis \mu = 1
I hope this helps you in understanding, why we donât change test_mu (our hypothesis) between \text{CL}_{sb} and \text{CL}_{b}. This should also help with understanding that in my approach I donât determine \mu' myself but assume the distributions to be centered around 0 or 1.

Again, I hope my answer helps you. If any questions remain, feel free to ask.

Cheers
Daniel

Hi Daniel,

Thanks for all the useful information you have shared with me! I really apologize for the delayed response. Lately I was trying to figure out how to obtain limits using toys, but this way is very computationally intensive, so I tried using the asymptotic formulae.

Actually, I have some more questions to ask, if you may.

• My first question has to do with the calculation of \sigma, using equation (30).

As you said the Asimov dataset is built in terms of the null hypothesis, which in this case is the background only hypothesis. Therefore, the Asimov dataset corresponds to the background only prediction. So, in order to compute \sigma from (30), I would have to compute q_{\mu,A} , which of course corresponds to the minus two times the logarithm of the likelihood ratio, which also defines the test statistic.

The denominator of q_{\mu,A} , is equal to the value of the likelihood function evaluated at \hat{\mu}, which is obtained from fitting the pdf to the Asimov dataset. In the numerator, do I have to use the value of the likelihood function evaluated at test_mu in order to obtain the value of q_{\mu,A}? Also, will \mu^{\prime} be equal to 0, because we are referring to the background only hypothesis?

• My second question has to do with the definition of the cumulative.

In your code, you are trying to exclude the signal hypothesis (\mu=1). Letâs suppose I want to obtain CLs for a different \mu, letâs say 0.1. Of course, in the code, the test_mu will have to be replaced by 0.1. Will \mu^{\prime} be equal to 0.1 as well, for the CLs+b, assuming that \mu will be centered around this value?

• My third and final question has again to do with some definitions in your code.

When evaluating the test statistic, you pass in mu as an argument in the get_q_tilde function. Also, when defining the get_F function, you pass in test_mu as an argument. Arenât those supposed to be equal, in terms of that the test statistic need to be evaluated at test_mu?

Cheers,

Panagiotis

Hi Panagiotis,

• Yes you use the test_mu value in the numerator of \tilde{q}_{A, \mu}.
• Yes, exactly as you said for the Asimov dataset we know \mu' = 0 in this case, as we use background-only as Asimov in an exclusion.

• That is a very good point and yes the test_mu should change accordingly when calculating CL_{s}. This is not solved nicely in my code as the test_mu is kind of hard-coded at that place.

• You are in principle right to spot that test_mu can be found from \mu like it has been in other parts of the code.
• There is an important details to notice here though:
• The variable mu in my code is not just a number (e.g. float or double) but a RooRealVar that has an initial value and bounds.
• This initial value is equal to test_mu and is extracted as mu.getVal() in the get_q_tilde() function.

This last part is just meant to help you understand my code.

For completeness, I have one additional piece of information:
In my example code I used the most basic kind of asimov dataset by just using the background-only hypothesis. In some of the literature this is referred to as the ânominal Asimovâ or âpre-fit Asimovâ in contrast to the âpost-fit Asimovâ where the best-fit values for the uncertainty parameters are used to obtain the Asimov dataset.
Which of these possibilities to use is a discussion of its own and beyond the scope of this forum though, and I just want anyone who reads this thread to be aware that there are multiple options.

Cheers,
Daniel

1 Like

Hi Daniel,

First of all, everything you have shared with me has been extremely helpful! Really, I canât thank you enough!

I have some final questions (hopefully), this time concerning the expected limits. My questions are the following:

From my understanding and also based on your code, the procedure of setting expected limits is exactly the same as for the observed limits, with the difference that the expected limits are obtained by fitting our pdf to the Asimov dataset, in order to obtain \tilde{q_{\mu}}, which will be later used in the cumulative.

1. Are the expected limits different for different signal types (different mass candidates in brazilian plots ), or is it a single value that applies to all cases?

2. Are the error bands of the expected limits (1 \sigma , 2 \sigma) determined from \sigma that is calculated using equation (30)?

Thank you so much for your time and help!

Cheers,
Panagiotis

Hi Panagiotis,

First of all you are correct that working with expected limits is very close to working with the observed limits.

• With different signal types we will get different pdfs for each signal type that we fit to the same dataset. Therefore, the limits for each signal will generally be different.

• The standard deviation of \mu is in fact the \sigma from equation (30). Therefore this \sigma is also used to determine the error bands of our obtained limits. For the asymptotic formulae you can take a look at Section 4.3 in the paper linked above to find (88) and (89).

Cheers,
Daniel

Hi Daniel,

I am really sorry for disturbing you again. I have followed all the steps we have already discussed, but I think there is something I am missing and here is why:

When I try to evaluate the expected limit for a specific signal type, I get a value for \mu and a value for \sigma, which corresponds to the standard deviation of \mu and is a function of \mu in general. The relation from which I calculate \sigma, is Equation (30), as we have already discussed.

However, the value of \sigma seems to be large enough, so that when I evaluate \mu - 2\sigma in order to obtain the \pm 2 error band for the expected limits, the result becomes negative.

In the following script, I try to implement the test statistic q_{\mu} and use the asymptotic formulae, as given in the paper you have posted, in order to calculate the 95% CL limits.

asymptotics.C (4.3 KB)

In the above script, the test_mu value you will see is the value of \mu that satisfies the condition CLs_{exp} = 0.05, for that specific signal. The script prints the values of CLs, \sigma and \mu_{exp} - 2\sigma.

Also, I will post the root file from where I get my histograms, in case it is needed.

drell_yan_sr_histos.root (48.2 KB)

As I said, I must be missing something because a negative result seems odd. If you spot any error in the procedure or in the code, please let me know. Again, thank you so much for the time you have spent answering my questions!

Cheers,
Panagiotis

Hi Panagiotis,
The following are my findings on the topic:

• In chapter 4 the paper introduces that these standard deviations apply to the expected exclusion limits found from the asimov dataset.
As a sanity check we can compare the result from equation (69) to the values you found by trying different test_mu.
• In the text after equation (61) it is stated that \Phi^{-1}(1-\alpha) \approx 1.64 for \alpha=0.05.
Therefore we find the upper limit \hat{\mu} + 1.64\sigma.
For the asimov dataset we have \hat{\mu} = \mu' = 0, so we find \mu_{up} = 1.64\sigma.
From this it is the expected behavior that \mu_{up} - 2\sigma < 0.
• The paper also gives a few examples of applying the formulae. In the figures 11 and 12 we see the same behavior as you experienced.
• On the other hand, the error bands are usually symmetric around \mu_{up} in a log-plot in published analyses.
Following this, I found the following paper from the Higgs search.
• In this paper, a slightly different formula is given for the error bands (see equation (23)). This leads to different error bands.

To summarize, I believe that your code is not the problem here. The question of where these differences come from and what they mean is a very technical question on statistics and beyond my knowledge and beyond the ROOT Forum. Any further questions on this you should probably discuss this with your supervisors or other experts in your group.
I hope you find a solution for your problem.

Cheers
Daniel

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.