I’ve been using TFeldmanCousin class to calculate trusted intervals for the number of signal events. This appeared to be simple, and the following lines handle this task:

f = ROOT.TFeldmanCousins()
ul = f.CalculateUpperLimit(int_val_scaled_all, int_val_scaled_bkg)
ll = f.GetLowerLimit()

Although, now I’m trying to extract a likelihood curve from it to convolute it with a Gaussian, which will account the systematic uncertainties. And I cannot find anything useful within the implemented methods for v6.26: https://root.cern.ch/doc/v626/classTFeldmanCousins.html

Could you please advise how it can be done? How may I extract the likelihood curve from a TFeldmanCousin?

I’ve found this discussion, where a similar question is reviewed. But I still cannot come up with the method to implement it technically.

You can use either RooStats and see how it can be done using for example the post linked above and some tutorials, for example this notebook to create the model and then this other one to run the interval calculation. For Feldman-Cousins you should use a FrequentistCalculator and a 2-sided profile likelihood ratio test statistics and pass false to the useCLs flag.

Otherwise the likelihood in TFeldmanCousins it is very simple, it is simple the Poisson pdf, see here.
You can implement it your self with simple C++ or Python code

Thank you for your reply. I’ve tried to apply the approach you suggested.

The result I’m getting with RooStats and StandardHypoTestInvDemo doesn’t match TFeldmanCousin method result. Namely, I’m getting the following output for TFeldmanCousin:

But the result I could get by defining the method as following: StandardHypoTestInvDemo("CountingModel.root","w","ModelConfig","","dtst",0,2,false,10,0,800,1000);
is: The computed upper limit is: 0.976223 +/- 0.0476669

I’ve assigned the parameters you recommended, but the output is still very different. To say more, it also appears to be very unstable (see the CL scan attached below). CLscan.pdf (15.3 KB)

Could you please advise me on what might be the reason for such a significant discrepancy?

I would like to continue this thread since it is still relevant for me.

To give more context, I am trying to put UL on the number of signal events in an IM distribution. The integral of a direct fit gives the following numbers of events under the fit in 3 sigma window:

nbkg = 0.602+-0.05
nsig = 0.000+-0.40

I then need to put systematic uncertainties on top (that’s why I cannot use TFeldmanCousins class directly). For that, I’ve been trying to use StandardHypoTestInvDemo macro, as advised. Although, the result I am getting with the suggested “as FCs” parameters for StandardHypoTestInvDemo give different result, which in addition appears to be very unstable. If I change the ranges for parameters, the result fluctuates significantly.

I might be mistaken in the way I implement the model. Could you please take a look into the following lines to give me advice on what I might be doing wrong? CountingModel.C (2.4 KB)

What is the model you are using to get the result for UL=0.976223 +/- 0.0476669 ?
It is possible that if you add a small systematics you break the discreteness of the Poisson likelihood and you get a better UL, (0.97 vs 1.78).

I will look tomorrow at your problem, but in the macro why you have nobs = 0.6 ? nobs should be an inter number

I am using the model, which is described in the attached file, with the FC-like parameters. But the result is very unstable.

I also estimate nobs as an integral of a fit curve in a particular interval. That’s how I get nobs=0.602. It works fine for TFeldmanCousinns even though it is weird for Poisson.

Hi,
TFeldmanCousins works for non integer, but basically it rounds the value to the lower integer, like doing int(0.602).

If you have nobs not an integer, but estimating from an integral, I think the distribution is not Poisson anymore but you should use the normal distribution. You can still use FC in RooStats, by modelling your signal as a Gaussian.

Hi,
The integer rounding is happening only for the number of observed events not for the expected one from background. You can try:
Nobs = 0., Nbkg = 0.02 - > 90% upper limit = 2.42
Nobs=0.99 Nbkg = 0.02 same upper limit = 2.42

Blockquote
After I corrected nobs and b in the counting model, the result is still unstable …

Thank you for clarification, Lorenzo. With the following setup:

nobs = 0
b = 0.602
sigmab = 0.00001 // t set systematics to 0, just like FCs

I am getting:

The computed upper limit is: 1.75223 +/- 0.174647
Expected upper limits, using the B (alternate) model :
expected limit (median) 1.84414
expected limit (-1 sig) 1.62917
expected limit (+1 sig) 3.75
expected limit (-2 sig) 0.745919
expected limit (+2 sig) 15

Here the computed upper is still not identical to FCs, but the expected limit (median) actually is. So which one should I take as a reference?

Although, there is still another problem. If I change sigmab into a reasonable value (0.088), the algorithm crashes:

The computed upper limit is: 1.15371 +/- 0.0726914
Expected upper limits, using the B (alternate) model :
expected limit (median) Info in ROOT::Math::BrentMethods::MinimStep: Grid search failed to find a root in the interval
Info in ROOT::Math::BrentMethods::MinimStep: xmin = 0 xmax = 5 npts = 100
Error in ROOT::Math::BrentRootFinder: Interval does not contain a root
HypoTestInverterResult - interpolation failed for interval [0,5 ] g(xmin,xmax) =0.002,0 target=0.1 return inf
One may try to clean up invalid points using HypoTestInverterResult::ExclusionCleanup().
Info in ROOT::Math::BrentMethods::MinimStep: Grid search failed to find a root in the interval
Info in ROOT::Math::BrentMethods::MinimStep: xmin = 0 xmax = 5 npts = 100
…

Could you please advise me on what can be the reason?

It is also curious that if (just for comparison) I change test statistics type to 0, then everything becomes much smoother and more stable. And these to values appear two be identical. How crucial is the test statistics type? May it also be used for UL studies? scan.pdf (17.3 KB)

I have tried your example using this attached code to generate the model, and it is true I am getting with your values (nobs=0, b = 0.60, sigmab = 0.088) something of the order of 1.2 for the 95% upper limit.
The value is a bit unstable and it is due to the discreetness of the Poisson combined by the systematics, which can cause high fluctuations in the toys. Try maybe to run more toys and in a restricted intervals, no need to scan for larger s value than 3.

For the compatibility with TFeldmanCousins you should use in RooStats a model without systematics. You can simply do this by adding this line ,

optHTInv.noSystematics = true

before calling the StandardHypoTestInvDemo macro.

Using the test statistics=0, it is a different type, where only likelihood values are used instead of fitting and find the best one. It can be a sensible procedure, but it is not exactly the one proposed by Feldman-Cousins.

I’ve tried using a flag that disables systematics, and it works perfectly. I get results, which are 100% consistent with FCs:

The computed upper limit is: 1.84414 +/- 0
Expected upper limits, using the B (alternate) model :
expected limit (median) 1.84414
expected limit (-1 sig) 1.84414
expected limit (+1 sig) 3.75
expected limit (-2 sig) 1.84414
expected limit (+2 sig) 15

But I still don’t like what I get when I enable systematics. For exactly the same setup, like you tried on your side, but for 10x more toys and in a narrower range, I keep getting the same:

The computed upper limit is: 1.15321 +/- 0.0247277
Expected upper limits, using the B (alternate) model :
expected limit (median) 1.82587
expected limit (-1 sig) 1.14564
expected limit (+1 sig) 15
expected limit (-2 sig) 0.7536
expected limit (+2 sig) 15

This result is significantly less than the one with systematics disabled (using primary model), which should not happen. It seems like with systematics enabled, the algorithm goes nuts (look at the plot: black scatter should not behave like this, and general fluctuation is crazy even for 10k toys): scan.pdf (16.2 KB)

It also could help me a lot if you could advise me how I can access UL result in the code. I need to perform a scan for different cases, and store UL values in an array. How can I access it?

The result I think are correct. However I will check it deeper to be sure of this. The improvement I think is caused by breaking the discreteness of the test statistics by adding systematics. However, I don’t understand why such large deviations. For small systematics, I would advise maybe to drop them and neglect, since the limit without is more conservative.
The macro writes an HypoTestInverterResult objection a file, and from it you can get the upper limit. Example (after running the scan):

Thanks to your help and clarification, I now seem to know the minimum to perform this study.

The only thing that doesn’t allow me to converge it, is the behavior, when systematics are enabled. I have 10% uncertainty, which cannot be disregarded. And if I present the results, where UL with systematics is factor 2 less than w/o, the reviewers are gonna eat me . So, I vitally need a way to fix it.

Thank you for looking into it and looking forward to hearing from you.

I have understood the issue, as I said, it is a technical artefact due to the introduction of systematics.
To compute the p-value (CL(s+b) ) used to get the limit (the red points in your scan plot) you count the number of toy events the test statistics for the given s is larger or equal than the data value. Now when you don’t have systematics, if you observe 0 events, you will have a large fraction of toys also with zero events. In that case the test statistics in the data is exactly the same as the test statistics in the toys when generated with N=0. If you have a very small systematics, around 50% of the toys with N=0 will have a test statistics smaller than the data, decreasing then the obtained p-value.
I am not sure what is the correct value to quote, since this is just a procedure to compute limit.
We could maybe add an option to consider all the toys within an interval of the given test statistics value in the data, to avoid this effect. In this case we would get basically the same value without systematics.
However, if the systematic is small, (10% for 0.6 is small) , I think it is better to quote the value without systematics.
Note that Feldman-Cousins has also the weird property to have a better (lower) upper limit when you increase the background.

To explain better this above, I attach the test statistic distributions used to compute the p-value at s=2 for the model with Nobs=0 and Nexp=0.602. The first plot is with systematics, the second without.
The difference is due to include (or not include all) the red and black points closed to the black line (the test statistics value in the data). ts_dist_s2.pdf (17.4 KB)

Thank you very much for your explanation and examples. I took time to gather all the results I have, and it seems to be clear with what I have, and why the results are those they are.