Likelyhood scan with TFeldmanCousin

mc.dmytro · May 25, 2023, 12:14pm

Dear experts,

I’ve been using TFeldmanCousin class to calculate trusted intervals for the number of signal events. This appeared to be simple, and the following lines handle this task:

f = ROOT.TFeldmanCousins()
ul = f.CalculateUpperLimit(int_val_scaled_all, int_val_scaled_bkg)
ll = f.GetLowerLimit()

Although, now I’m trying to extract a likelihood curve from it to convolute it with a Gaussian, which will account the systematic uncertainties. And I cannot find anything useful within the implemented methods for v6.26:
https://root.cern.ch/doc/v626/classTFeldmanCousins.html

Could you please advise how it can be done? How may I extract the likelihood curve from a TFeldmanCousin?

I’ve found this discussion, where a similar question is reviewed. But I still cannot come up with the method to implement it technically.

D.

vpadulan · May 25, 2023, 5:42pm

Dear @mc.dmytro ,

Let me ask @moneta to help you on this one.
Cheers,
Vincenzo

moneta · June 1, 2023, 9:28am

Hi,

You can use either RooStats and see how it can be done using for example the post linked above and some tutorials, for example this notebook to create the model and then this other one to run the interval calculation. For Feldman-Cousins you should use a FrequentistCalculator and a 2-sided profile likelihood ratio test statistics and pass false to the useCLs flag.

Otherwise the likelihood in TFeldmanCousins it is very simple, it is simple the Poisson pdf, see here.
You can implement it your self with simple C++ or Python code

Lorenzo

mc.dmytro · June 9, 2023, 1:23am

Hi!

Thank you for your reply. I’ve tried to apply the approach you suggested.

The result I’m getting with RooStats and StandardHypoTestInvDemo doesn’t match TFeldmanCousin method result. Namely, I’m getting the following output for TFeldmanCousin:

→ Confidence limits (at the 90% CL):
Upper Limit = 1.76
Lower Limit = 0

But the result I could get by defining the method as following:
StandardHypoTestInvDemo("CountingModel.root","w","ModelConfig","","dtst",0,2,false,10,0,800,1000);
is:
The computed upper limit is: 0.976223 +/- 0.0476669

I’ve assigned the parameters you recommended, but the output is still very different. To say more, it also appears to be very unstable (see the CL scan attached below).
CLscan.pdf (15.3 KB)

Could you please advise me on what might be the reason for such a significant discrepancy?

Thank you!

D.

mc.dmytro · June 19, 2023, 1:39pm

Hi again!

I would like to continue this thread since it is still relevant for me.

To give more context, I am trying to put UL on the number of signal events in an IM distribution. The integral of a direct fit gives the following numbers of events under the fit in 3 sigma window:

 nbkg = 0.602+-0.05 
 nsig = 0.000+-0.40

I then need to put systematic uncertainties on top (that’s why I cannot use TFeldmanCousins class directly). For that, I’ve been trying to use StandardHypoTestInvDemo macro, as advised. Although, the result I am getting with the suggested “as FCs” parameters for StandardHypoTestInvDemo give different result, which in addition appears to be very unstable. If I change the ranges for parameters, the result fluctuates significantly.

I might be mistaken in the way I implement the model. Could you please take a look into the following lines to give me advice on what I might be doing wrong?
CountingModel.C (2.4 KB)

Thank you beforehand.

Dmytro.

moneta · June 19, 2023, 8:43pm

Hi,

What is the model you are using to get the result for UL=0.976223 +/- 0.0476669 ?
It is possible that if you add a small systematics you break the discreteness of the Poisson likelihood and you get a better UL, (0.97 vs 1.78).

I will look tomorrow at your problem, but in the macro why you have nobs = 0.6 ? nobs should be an inter number

Lorenzo

mc.dmytro · June 19, 2023, 10:06pm

Hi!

Thanks for following this thread.

I am using the model, which is described in the attached file, with the FC-like parameters. But the result is very unstable.

I also estimate nobs as an integral of a fit curve in a particular interval. That’s how I get nobs=0.602. It works fine for TFeldmanCousinns even though it is weird for Poisson.

Looking forward to hearing from you.
Dmytro.

moneta · June 21, 2023, 4:41pm

Hi,
TFeldmanCousins works for non integer, but basically it rounds the value to the lower integer, like doing int(0.602).

If you have nobs not an integer, but estimating from an integral, I think the distribution is not Poisson anymore but you should use the normal distribution. You can still use FC in RooStats, by modelling your signal as a Gaussian.

Lorenzo

mc.dmytro · June 21, 2023, 7:02pm

Good evening!

Thank you for your reply.

I am not sure if FC treats non-integer numbers by simply rounding them. Here are some of the results I have with FCs:

//>>>>> X(4274) <<<<<

→ Integral range: [4.207, 4.381]
Sgn integral value: 0.00 ± 0.41
Bkg integral value: 0.02 ± 0.00
All integral value: 0.02 ± 0.00

→ Confidence limits (at the 90% CL):
Upper Limit = 2.42
Lower Limit = 0.09

//>>>>> X(4685) <<<<<

→ Integral range: [4.477, 4.891]
Sgn integral value: 0.00 ± 0.42
Bkg integral value: 0.60 ± 0.05
All integral value: 0.60 ± 0.06

→ Confidence limits (at the 90% CL):
Upper Limit = 1.84
Lower Limit = 0.00

As you can see, int(Nbkg)=int(Nobs)=0 for both cases, but confidence intervals appear to be different.

After I corrected nobs and b in the counting model, the result is still unstable and differs from FCs for almost factor 2:

The computed upper limit is: 2.93056 +/- 0.118934
(for X(4685)).

Thank you.

D.

moneta · June 21, 2023, 8:39pm

Hi,
The integer rounding is happening only for the number of observed events not for the expected one from background. You can try:
Nobs = 0., Nbkg = 0.02 - > 90% upper limit = 2.42
Nobs=0.99 Nbkg = 0.02 same upper limit = 2.42

Blockquote
After I corrected nobs and b in the counting model, the result is still unstable …

What is the correction you have applied ?

Lorenzo

mc.dmytro · June 21, 2023, 9:55pm

Hi!

Thank you for clarification, Lorenzo. With the following setup:

nobs = 0
b = 0.602
sigmab = 0.00001 // t set systematics to 0, just like FCs

I am getting:

The computed upper limit is: 1.75223 +/- 0.174647
Expected upper limits, using the B (alternate) model :
expected limit (median) 1.84414
expected limit (-1 sig) 1.62917
expected limit (+1 sig) 3.75
expected limit (-2 sig) 0.745919
expected limit (+2 sig) 15

Here the computed upper is still not identical to FCs, but the expected limit (median) actually is. So which one should I take as a reference?

Although, there is still another problem. If I change sigmab into a reasonable value (0.088), the algorithm crashes:

The computed upper limit is: 1.15371 +/- 0.0726914
Expected upper limits, using the B (alternate) model :
expected limit (median) Info in ROOT::Math::BrentMethods::MinimStep: Grid search failed to find a root in the interval
Info in ROOT::Math::BrentMethods::MinimStep: xmin = 0 xmax = 5 npts = 100
Error in ROOT::Math::BrentRootFinder: Interval does not contain a root
HypoTestInverterResult - interpolation failed for interval [0,5 ] g(xmin,xmax) =0.002,0 target=0.1 return inf
One may try to clean up invalid points using HypoTestInverterResult::ExclusionCleanup().
Info in ROOT::Math::BrentMethods::MinimStep: Grid search failed to find a root in the interval
Info in ROOT::Math::BrentMethods::MinimStep: xmin = 0 xmax = 5 npts = 100
…

scan.pdf (17.2 KB)

Could you please advise me on what can be the reason?

It is also curious that if (just for comparison) I change test statistics type to 0, then everything becomes much smoother and more stable. And these to values appear two be identical. How crucial is the test statistics type? May it also be used for UL studies?
scan.pdf (17.3 KB)

D.

moneta · June 22, 2023, 10:02am

Hi,

I have tried your example using this attached code to generate the model, and it is true I am getting with your values (nobs=0, b = 0.60, sigmab = 0.088) something of the order of 1.2 for the 95% upper limit.
The value is a bit unstable and it is due to the discreetness of the Poisson combined by the systematics, which can cause high fluctuations in the toys. Try maybe to run more toys and in a restricted intervals, no need to scan for larger s value than 3.

For the compatibility with TFeldmanCousins you should use in RooStats a model without systematics. You can simply do this by adding this line ,

optHTInv.noSystematics = true

before calling the StandardHypoTestInvDemo macro.

Using the test statistics=0, it is a different type, where only likelihood values are used instead of fitting and find the best one. It can be a sensible procedure, but it is not exactly the one proposed by Feldman-Cousins.

Lorenzo

mc.dmytro · June 22, 2023, 10:50am

Hello Lorenzo

Thank you very much for following this thread.

I’ve tried using a flag that disables systematics, and it works perfectly. I get results, which are 100% consistent with FCs:

The computed upper limit is: 1.84414 +/- 0
Expected upper limits, using the B (alternate) model :
expected limit (median) 1.84414
expected limit (-1 sig) 1.84414
expected limit (+1 sig) 3.75
expected limit (-2 sig) 1.84414
expected limit (+2 sig) 15

But I still don’t like what I get when I enable systematics. For exactly the same setup, like you tried on your side, but for 10x more toys and in a narrower range, I keep getting the same:

The computed upper limit is: 1.15321 +/- 0.0247277
Expected upper limits, using the B (alternate) model :
expected limit (median) 1.82587
expected limit (-1 sig) 1.14564
expected limit (+1 sig) 15
expected limit (-2 sig) 0.7536
expected limit (+2 sig) 15

This result is significantly less than the one with systematics disabled (using primary model), which should not happen. It seems like with systematics enabled, the algorithm goes nuts (look at the plot: black scatter should not behave like this, and general fluctuation is crazy even for 10k toys):
scan.pdf (16.2 KB)

It also could help me a lot if you could advise me how I can access UL result in the code. I need to perform a scan for different cases, and store UL values in an array. How can I access it?

D.

moneta · June 22, 2023, 11:12am

Hi,

The result I think are correct. However I will check it deeper to be sure of this. The improvement I think is caused by breaking the discreteness of the test statistics by adding systematics. However, I don’t understand why such large deviations. For small systematics, I would advise maybe to drop them and neglect, since the limit without is more conservative.
The macro writes an HypoTestInverterResult objection a file, and from it you can get the upper limit. Example (after running the scan):

TFile f("Freq_Cls+b_grid_ts2_CountingModel.root");
RooStats::HypotestInverterResult  * r;
f.GetObject("result_s",r);
cout << r->UpperLimit() << endl;

Lorenzo

mc.dmytro · June 23, 2023, 11:33am

Hi!

Thanks to your help and clarification, I now seem to know the minimum to perform this study.

The only thing that doesn’t allow me to converge it, is the behavior, when systematics are enabled. I have 10% uncertainty, which cannot be disregarded. And if I present the results, where UL with systematics is factor 2 less than w/o, the reviewers are gonna eat me . So, I vitally need a way to fix it.

Thank you for looking into it and looking forward to hearing from you.

D.

moneta · June 23, 2023, 2:39pm

Hi,

I have understood the issue, as I said, it is a technical artefact due to the introduction of systematics.
To compute the p-value (CL(s+b) ) used to get the limit (the red points in your scan plot) you count the number of toy events the test statistics for the given s is larger or equal than the data value. Now when you don’t have systematics, if you observe 0 events, you will have a large fraction of toys also with zero events. In that case the test statistics in the data is exactly the same as the test statistics in the toys when generated with N=0. If you have a very small systematics, around 50% of the toys with N=0 will have a test statistics smaller than the data, decreasing then the obtained p-value.
I am not sure what is the correct value to quote, since this is just a procedure to compute limit.
We could maybe add an option to consider all the toys within an interval of the given test statistics value in the data, to avoid this effect. In this case we would get basically the same value without systematics.
However, if the systematic is small, (10% for 0.6 is small) , I think it is better to quote the value without systematics.
Note that Feldman-Cousins has also the weird property to have a better (lower) upper limit when you increase the background.

Lorenzo

moneta · June 23, 2023, 3:02pm

To explain better this above, I attach the test statistic distributions used to compute the p-value at s=2 for the model with Nobs=0 and Nexp=0.602. The first plot is with systematics, the second without.
The difference is due to include (or not include all) the red and black points closed to the black line (the test statistics value in the data).
ts_dist_s2.pdf (17.4 KB)

ts_dist_s2_nosys.pdf (14.2 KB)

mc.dmytro · June 26, 2023, 11:45am

Hi!

Thank you very much for your explanation and examples. I took time to gather all the results I have, and it seems to be clear with what I have, and why the results are those they are.

Many thanks for helping!

Best regards
Dmytro.