# 95% Limits with the CLs method

Hi everyone,

In the context of my MSc thesis, I am trying to set 95 %CL limits, without incorporating any systematic effect. I have some background histograms, signal histograms and data histograms. I am trying to set limits without the help of the RooFit/RooStats packages, so everything concerning the statistical part of the procedure, is written from scratch (i.e. likelihoods, test statistic etc.).

When trying to set limits, I follow the steps that are described in this paper:

higgs_limits.pdf (771.6 KB)

However, when I try to implement the steps in order to get the observed limits, I get results that are far from reasonable. In the following lines, I post the code that I am using, as well as the root file I am getting my histograms from:

limits.C (4.2 KB)
drell_yan_sr_histos.root (39.0 KB)

In this script, I use a binned likelihood function, assuming that content of each bin is Poisson distributed. Then I proceed building the test statistic q_mu, as minus 2 times the logarithm of the likelihood function evaluated for the the test_mu value, over the likelihood function evaluated at the maximum likelihood mu.

I donâ€™t know what is wrong with the whole procedure. I feel very puzzled. Any help will be deeply appreciated.

Cheers,

Panagiotis

1 Like

Hi @Panagiotis,

Having a quick look at your script, this part in the toy generation looks like an error to me:

...
bin_content[i]=h->GetBinContent(i+1);
...
double x=r->Poisson(bin_content[j]);


Here, h is your data histogram. It doesnâ€™t make sense to sample the toy datasets according to the observed distribution: you need to sample two different population of toy datasets, one according to signal + background, and one according to background only. This is clearly stated in the caption of figure 1 in the paper you linked:

Figure 1: Test statistic distributions for ensembles of pseudo-data generated for sig-
nal+background and background-only hypotheses.

And then, why do you evaluate the test statistic for the toy datasets once for test_mu=0.01, and once for test_mu=0? Like you can also see in the legend in figure 1, the test mu should be the same. The difference between the red and the blue histogram is just the hypothesis that was used to sample the toy dataset.

Just my 2 cents, I hope that helps you to finish your homework, but donâ€™t expect that we do everything for you

Hi @jonas ,

Of course, I am not expecting that anybody does everything for me, I just wanted to clarify some parts of the procedure that I think I have misunderstood.

I thought that by evaluating the test statistic some value of test_mu (in this particular case, the value 0.01), what I would obtain would be the corresponding distribution of the test statistic, under the hypothesis of this particular mu. So, according to my mistaken logic, by evaluating the test statistic for mu=0, I would get the distribution of the test statistic, for the background only hypothesis.

Also, the reason I am sampling the toy datasets according to the observed distribution, is because I thought that the term â€śpseudoexperimentsâ€ť, stands for a variation in the observed data, which would yield different values of the test statistic and thus, its distribution.

So, basically, what you are saying is that I have to generate signal+background and background only toys, and compare it with the data, in order to extract the maximum likelihood values, for a specific test_mu, and then proceed and calculate the corresponding p-values, according to the observed value of the test statistic?

Sorry for the long text, but as it is obvious, I have misunderstood some parts of the procedure.

Thanks again for your time and remarks!

Cheers,

Panagiotis

So, basically, what you are saying is that I have to generate signal+background and background only toys, and compare it with the data, in order to extract the maximum likelihood values, for a specific test_mu, and then proceed and calculate the corresponding p-values, according to the observed value of the test statistic?

Almost!

You have to generate signal+background where the signal is scaled by a given test \mu and background only toys, to obtain test statistic distributions for the background hypothesis and signal hypothesis for a given \mu. You then compute the test statistic for the data and then proceed and calculate the corresponding p-values, according to the observed value of the test statistic. From the p-values, you get the CLs.

But here is the kicker: to get the 95 % CL upper limit, you have to iteratively repeat this procedure for many values of the test \mu to find the value for which the CLs is 0.95, just as it says at the end of section 2.1 in the paper! That means generating tons of toys in total, which can take a very long time. Thatâ€™s why people usually use the asymptotic approximations, like explained by @wernerd in this thread:
https://root-forum.cern.ch/t/cls-upper-limits-without-roofit-roostats-packages

1 Like

Hi again @jonas ,

Thanks for your very helpful comments! I have a final question regarding the process, if you may.

Letâ€™s suppose I am generating toys for signal+background and background only hypotheses, where as you stated above, the signal is scaled by a test mu factor. This test mu factor is the one that will determine the limit, if the value of the CLs statistic is equal to 0.05.

When computing the test statistic for the toys, the enumerator is the value of the likelihood function evaluated at mu, which mu is the given mu you mentioned, and the denominator is the value of the likelihood function evaluated at the best fit mu, which is obtained from comparing the data with the toys. My question is the following:

Does this mu play any role in the determination of the limit? How do I choose its value? Should it be equal to the test mu?

Also, when generating background only toys, you have stated that the value of mu doesnâ€™t change, so the value of the enumerator of the test statistic will be the same. In the denominator, the maximum likelihood value will be equal to zero because we are referring to the background hypothesis, or it should be equal to the maximum likelihood mu obtained from comparing the data with the background? (Something like a background strength parameter?).

Cheers,

Panagiotis

When computing the test statistic for the toys, the enumerator is the value of the likelihood function evaluated at mu, which mu is the given mu you mentioned, and the denominator is the value of the likelihood function evaluated at the best fit mu, which is obtained from comparing the data with the toys.

This is the place where you misunderstood things. The mu in the numerator is the test mu like you said, but the mu â€śhatâ€ť in the denominator is the best-fit mu for that given toy! So you have to fit each toy, which is part of what makes this procedure so expensive. The observed data is not used when filling the q-tilde histograms for the toys. You only use the data once to evaluate q-tilde from data and get the p values.

Does this mu play any role in the determination of the limit? How do I choose its value? Should it be equal to the test mu?

Of course it canâ€™t be equal to the test mu, otherwise numerator and denominator would be the same Actually you donâ€™t care about itâ€™s value. The point is that in the denominator, you have the best-fit NLL value. It also says it after equation 5 in the text: â€śThe pair of parameter estimators
ÎĽ-hat and Î¸-hat correspond to the global maximum of the likelihood.â€ť So when evaluating q-tilde for a toy, that means the global likelihood maximum for that toy. Maybe that was not explicitly clear in the text and the reason for why you always thought you need to â€ścompare data with the toysâ€ť.

Also, when generating background only toys, you have stated that the value of mu doesnâ€™t change, so the value of the enumerator of the test statistic will be the same. In the denominator, the maximum likelihood value will be equal to zero because we are referring to the background hypothesis, or it should be equal to the maximum likelihood mu obtained from comparing the data with the background? (Something like a background strength parameter?).

I think this is again the same confusion, and I hope things are clearer now!

Cheers,
Jonas

1 Like

Hi @jonas ,

Thank you very much for the clarification! Iâ€™ve modified my code a little and I think itâ€™s working properly now! Thanks a lot! Iâ€™ve also changed my likelihood minimizer from Minuit to Minuit2, which seems to be faster and more efficient!

I have a final question regarding limits. What we have discussed so far refers to the case of the observed 95% CL limits.

Letâ€™s suppose one wanted to obtain the expected 95% CL limits. Would the procedure be the same, with the difference that they have to generate background only toys, where the background only toys will be scaled by a given test mu, and then fitted? Is this test mu the one that will determine the expected limit?

Also, is the expected limit a single value that applies for all the different signals (different mass candidates), or should it be evaluated for every single one of them?