RooStats NumberCounting with very low backgrounds

Dear all,

i have followed the NumberCounting tutorials, but i am not happy with the result yet. Problem is as follows:

I have an experiment with very low number of observed event (1-5) and very low expected backgrounds (around 0.1). Both background and the signal have (known and partially correlated) uncertainties (not determined from sideband but from other methods). I need to know a) the p-value of the background only hypothesis and b) the e.g 90% confidence interval. Is there a recommended way of doing it in RooStats?

Best regards,
Torben

Hi,

The first thing to do is to build a RooFIt workspace with your number counting model. ONe example of doing this is at the beginning of this tutorial
root.cern.ch/root/html/tutorials … tor.C.html
Or, you can also make the workspace as in a simple macro I am attaching

Once you have the workspace with the ModelConfig you can use the StandardXXXX macro tutorials.
For example for finding the interval you can use StandardHypoTestInvDemo.C for using a frequentist limit
(with CLs or Feldman-Cousins)
For the background p-value you can use directly the FrequentistCalculator or a macro (not yet in tutorials) which I attach, which compute it using various possible HypoTestCalculators

Best Regards

Lorenzo
StandardHypoTestDemo.C (11.9 KB)
makeCountingModel.C (1.16 KB)

Thank you Lorenzo!

Using these two macros, i am able to build the workspace:

[code]
root [0] .x makeCountingModel.C(40,20,3)

RooFit v3.50 – Developed by Wouter Verkerke and David Kirkby
Copyright © 2000-2011 NIKHEF, University of California & Stanford University
All rights reserved, please read http://roofit.sourceforge.net/license.txt

[#1] INFO:ObjectHandling – RooWorkspace::exportToCint(w) INFO: references to all objects in this workspace will be created in CINT in ‘namespace w’
[#1] INFO:ObjectHandling – RooWorkspace::import(w) importing dataset data
model written to file[/code]

But the StandardXXXX tutorials dont run, i am using ROOT 5.32/00 (tags/v5-32-00@42375, Dec 02 2011, 12:42:25 on linuxx8664gcc) (DESY central installation, 64bit).

I get this error/printout:

root [0] .x StandardHypoTestDemo.C("countingmodel_n40_b20.root")

RooFit v3.50 -- Developed by Wouter Verkerke and David Kirkby
                Copyright (C) 2000-2011 NIKHEF, University of California & Stanford University
                All rights reserved, please read http://roofit.sourceforge.net/license.txt


RooWorkspace(w) w contents

variables
---------
(b,b_glob,nobs,s,sigmab)

p.d.f.s
-------
RooGaussian::constraint[ x=b mean=b_glob sigma=sigmab ] = 1.92875e-22
RooProdPdf::model[ pdf * constraint ] = 3.36837e-24
RooPoisson::pdf[ x=nobs mean=nexp ] = 0.017464

functions
--------
RooAddition::nexp[ s + b ] = 51

datasets
--------
RooDataSet::data(nobs)

parameter snapshots
-------------------
ModelConfig__snapshot = (s=1)

named sets
----------
ModelConfig_GlobalObservables:(b_glob)
ModelConfig_NuisParams:(b)
ModelConfig_Observables:(nobs)
ModelConfig_POI:(s)
ModelConfig__snapshot:(s)

generic objects
---------------
RooStats::ModelConfig::ModelConfig

Info in <StandardHypoTestInvDemo>: The background model  does not exist
Info in <StandardHypoTestInvDemo>: Copy it from ModelConfig ModelConfig and set POI to zero
Info in <StandardHypoTestDemo>: Model ModelConfig has no snapshot  - make one using model poi
[#1] INFO:ObjectHandling -- RooWorkspace::saveSnaphot(w) replacing previous snapshot with name ModelConfig__snapshot
Error: Can't call ProfileLikelihoodTestStat::SetOneSidedDiscovery(1) in current scope StandardHypoTestDemo.C:197:
Possible candidates are...
(in ProfileLikelihoodTestStat)
(in TestStatistic)
*** Interpreter error recovered ***

Setting the TestStatType to 0 (LEP), i get a segmentation violation:

root [0] .x StandardHypoTestDemo.C("countingmodel_n40_b20.root")

RooFit v3.50 -- Developed by Wouter Verkerke and David Kirkby
                Copyright (C) 2000-2011 NIKHEF, University of California & Stanford University
                All rights reserved, please read http://roofit.sourceforge.net/license.txt


RooWorkspace(w) w contents

variables
---------
(b,b_glob,nobs,s,sigmab)

p.d.f.s
-------
RooGaussian::constraint[ x=b mean=b_glob sigma=sigmab ] = 1.92875e-22
RooProdPdf::model[ pdf * constraint ] = 3.36837e-24
RooPoisson::pdf[ x=nobs mean=nexp ] = 0.017464

functions
--------
RooAddition::nexp[ s + b ] = 51

datasets
--------
RooDataSet::data(nobs)

parameter snapshots
-------------------
ModelConfig__snapshot = (s=1)

named sets
----------
ModelConfig_GlobalObservables:(b_glob)
ModelConfig_NuisParams:(b)
ModelConfig_Observables:(nobs)
ModelConfig_POI:(s)
ModelConfig__snapshot:(s)

generic objects
---------------
RooStats::ModelConfig::ModelConfig

Info in <StandardHypoTestInvDemo>: The background model  does not exist
Info in <StandardHypoTestInvDemo>: Copy it from ModelConfig ModelConfig and set POI to zero
Info in <StandardHypoTestDemo>: Model ModelConfig has no snapshot  - make one using model poi
[#1] INFO:ObjectHandling -- RooWorkspace::saveSnaphot(w) replacing previous snapshot with name ModelConfig__snapshot

=== Using the following for ModelConfigB_only ===
Observables:             RooArgSet:: = (nobs)
Parameters of Interest:  RooArgSet:: = (s)
Nuisance Parameters:     RooArgSet:: = (b)
Global Observables:      RooArgSet:: = (b_glob)
PDF:                     RooProdPdf::model[ pdf * constraint ] = 3.36837e-24
Snapshot:
  1) 0x14a4a0c0 RooRealVar:: s = 0  L(0 - 100)  "s"


=== Using the following for ModelConfig ===
Observables:             RooArgSet:: = (nobs)
Parameters of Interest:  RooArgSet:: = (s)
Nuisance Parameters:     RooArgSet:: = (b)
Global Observables:      RooArgSet:: = (b_glob)
PDF:                     RooProdPdf::model[ pdf * constraint ] = 4.14674e-24
Snapshot:
  1) 0x14a4a090 RooRealVar:: s = 1  L(0 - 100)  "s"

[#0] PROGRESS:Generation -- Test Statistic on data: -0.207895
[#1] INFO:InputArguments -- Using a ToyMCSampler. Now configuring for Null.
[#0] ERROR:InputArguments -- ToyMCSampler: Error : pdf is not extended and number of events per toy is zero

 *** Break *** segmentation violation

Hi,

The line profll->SetOneSidedDiscovery works only in 5.33 or the HEAD of 5.32 patches. It is a fix in the test statistics to work for estimating significance in case of bounded parameters.
You could use eventually the two-sided test statistics (although is less optimal) and you will get a slighter lower significance (testStatisticsType = 2)
For the other crash you need to specify that the model is a number counting one, by setting the appropriate flag in the argument to pass in the macro.
So you could run with:

root> .x StandardHypoTestDemo.C("countingmodel_n40_b20.root", "w","ModelConfig","data",0,2,ntoys, true)

Lorenzo

Ahhh, i have the feeling i am closing in, perfect! Ok, macros are now running, thanks.

However, now i am puzzled by the results. Starting with a simple example

where i know that the background only p-value is given by 1-P(0), where P(0) is the Poisson-prob of observing zero events. The uncertainty of the bkg is very small, the p-value should be about 0.1… but the HypoTestCalculator returns 0.047 (making the number of toy mcs larger, does not change that a lot)

The macro you attached produces

[code]Results HypoTestCalculator_result:

  • Null p-value = 0.0468 +/- 0.00121942
  • Significance = 1.67671 sigma
  • Number of Alt toys: 6000
  • Number of Null toys: 30000
  • Test statistic evaluated on data: 1.40237
  • CL_b: 0.0468 +/- 0.00121942
  • CL_s+b: 0.1985 +/- 0.0051494
  • CL_s: 4.24145 +/- 0.155949[/code]

Hi,

This is a known effect due to the discreteness off the Poisson. By adding a systematics, although very small you are removing the discreteness and you get immediately better values (tighter limits or larger significances).
You can see this by the plot, which the macro I have given to you produces, of the test statistics for B toys with respect to the data value.
If you change in the macro the variable noSystematics to be = true, you should be able to get what you expect.

Best Regards

Lorenzo

Hi,

i agree concerning the discreteness of the Poisson, but this effect is much smaller, at least for this example - or i do something terribly wrong which i am not aware of.

My approach:
Sample 30k events from this distribution:

TF1 *poisson = new TF1("poisson","TMath::PoissonI(x,[0])",0.,100.);

where [0]=0.10 (no systematics) or [0]=TRandom3->Gaus(0.10,0.01) (randomly sampled for each event)

The results are:
0.10 +/ 0.01 (TRandom3->Gaus)
prob for bkgd into >=1 event: 9.49667%

0.10 +/ 0.00 (no systematics)
prob for bkgd into >=1 event: 9.67667%

However, 30k events are too low to judge the difference between these two.

If i use the macro you gave me, i get for bool noSystematics = true; Results HypoTestCalculator_result:

  • Null p-value = 0.392867 +/- 0.00281971
  • Significance = 0.271855 sigma
  • Number of Alt toys: 6000
  • Number of Null toys: 30000
  • Test statistic evaluated on data: 0.196767
  • CL_b: 0.392867 +/- 0.00281971
  • CL_s+b: 0.629833 +/- 0.00623356
  • CL_s: 1.60317 +/- 0.0195998

for bool noSystematics = false;
Results HypoTestCalculator_result:

  • Null p-value = 0.0468 +/- 0.00121942
  • Significance = 1.67671 sigma
  • Number of Alt toys: 6000
  • Number of Null toys: 30000
  • Test statistic evaluated on data: 1.40237
  • CL_b: 0.0468 +/- 0.00121942
  • CL_s+b: 0.1985 +/- 0.0051494
  • CL_s: 4.24145 +/- 0.155949

The countmodel file is here:
countingmodel_n1_b0.root (11.9 KB)

If i reduce the background even further (e.g. from 0.1 to 0.05), the HypoTestCalculator does not give an result at all!?

I am pretty sure i am making some mistake… thanks for your patience Lorenzo!

Hi,

In the code example you provide, you use as test statistics the number of observed event, which is still discrete,
so you are still suffering from the discrete effect. While, when using the profile likelihood test statistic with the discreetness is removed when using the systematics in the background. This is due to this particular way the model is built and sample in case of nuisance parameters. One does not generate first values of b and then values of n according to b, but one first fixes the nuisance parameter b to the MLE estimate and then generate values of n and of the global observable b_glob, which is gaussian distributed

I don;t understand your result with RooStats when you set noSystematics=true. I get this by running:

StandardHypoTestDemo(“countingmodel_n1_b0.root”,“w”,“ModelConfig”,"",“data”,0,3,10000,true)

Results HypoTestCalculator_result: 
 - Null p-value = 0.0982 +/- 0.00297585
 - Significance = 1.29188 sigma
 - Number of Alt toys: 2000
 - Number of Null toys: 10000
 - Test statistic evaluated on data: 1.40259
 - CL_b: 0.0982 +/- 0.00297585
 - CL_s+b: 0.6585 +/- 0.0106037
 - CL_s: 6.7057 +/- 0.230117

The problem you are having when reducing b, are due to some numerical problem when fitting. I would suggest,
when creating the workspace, to restrict the range in b (for example from [0,1] ) and to do these two lines in the macro creating the workspace:

   // set input variables
   w->var("b")->setVal(b);
   w->var("b")->setError(berr);

Cheers

Lorenzo

Ciao Lorenzo,

allright, i can reproduce your values (slightly different since i cannot use option “3” here, but you explained that already). Sorry for the confusion, i dont know want went wrong. :confused:

I guess, the next step for me is to understand the different testTypes (LEP, TEV or LHC) and StandardXXXX macros, since LEP produces p=0.05x and LHC(2) p=0.02x, whereas NumberCountingUtils::BinomialObsP produces p=0.05x too.

Do you have any recommendation about literature or even an advice what the “best” approach is when dealing with very low numbers (1 or 2 signal events, much lower background)? All i can find is experiments with 5-10 signal events over usually quite high (>5) background… i fear that i miss something here if i move to much lower signal and bkg rates.

Hi,

What you have been doing with RooStats, using toys for estimating an interval or using a Bayesian methods are the reccomended tools for finding an interval or estimating a significance also in the case of very low observed events.
When dealing with nuisance parameters there is no universal method in statistics. When using the inversion method to find the interval, you have in RooStats the choice to deal using the full frequentist approach or the hybrid one. It is up to you to choose. The results are in general in good agreement, but in your particular case you get different one due to the discreteness of the Poisson distribution.

If you want to know more, see the paper:
B. Cousins et al, Evaluation of three methods for calculating statistical significance when incorporating a systematic uncertainty into a test of the background-only hypothesis for a Poisson process

Lorenzo