Upper limit for signal strength, ON/OFF problem, multiple independent runs

andrea.celentano · April 29, 2021, 7:44pm

Dear all,
I am trying to solve the following problem with Roofit/RooStats. I have a counting experiment, made by independent runs. In each run, there is a “beam-ON” time, with measured number of events n_ON^i, and a “beam-OFF” time, with measured numbers of events n_OFF^i. The overall likelihood reads:

where f_i is the luminosity for run i over the total luminosity, and tau is the beam-OFF/beam-ON ratio for the run i. I want to set a limit on S, Bi are the nuisance parameters, and nON_i and nOFF_i the observables.

I tried to implement this model in roofit. The code I am using is attached (the first part is to read the data from a txt file). In particular, the overall PDF is a RooProdPDF of many terms, one per run - each term being a product of two Poisson PDFs.
I noticed that, when using this code, if the number of runs I am considering is more than 15, the code I use to find the upper limit (HipoTestInverter) crashes. Maybe there is a limit on the number of terms I can use in this product?

I attach to this message the 3 files necessary to reproduce the problem: the root macro to create the model, the root macro for Hypothesis inversion, and the file with data.

data.txt (1.2 KB) poissonRuns_HipoInverter.C (6.3 KB) poissonRuns.C (7.0 KB)

jalopezg · April 30, 2021, 11:32am

Hi @andrea.celentano,

I am inviting @moneta to this topic; I am sure he can help you with this.

Cheers,
J.

andrea.celentano · April 30, 2021, 12:06pm

Dear @jalopezg,
thanks! Maybe the issue here is the following: I am considering a “single” experiment, made by many runs, i.e. my dataset, view as a table, where each row is one “event” and each column is one observable, at this moment is a single-row, multi-column dataset (the number of columns is 2*N, where “N” is the number of runs).
Maybe I should change approach, and consider the same problem treating the independent runs as independent rows in the dataset? The full PDF is still product over all terms.

Thanks,
Andrea

jalopezg · April 30, 2021, 6:49pm

Hi @andrea.celentano,

To be honest, I have almost no experience with RooFit and I cannot personally help you with this, but I am sure @moneta can point you in the right direction.

Cheers,
J.

andrea.celentano · May 5, 2021, 8:54am

Hi @jalopezg,
thanks. I hope that @moneta will have the chance to have a look at this!

Thanks!
Andrea

moneta · May 5, 2021, 3:00pm

Hi Andrea,

Sorry for my late reply. I have downloaded the code and it seems to run fine. How can I reproduce your problem ?
I am using ROOT master, which ROOT version are you using ?

Cheers

Lorenzo

andrea.celentano · May 5, 2021, 5:03pm

Hi Lorenzo,
I am using root version 6.18/04 (apologies for not having reported this before). In order to reproduce the issue, you can change the “14” in the data.txt file to “20” or more. This is the number of runs that are considered, i.e. the number of lines in the data.txt file that are read.

Thanks
Andrea

moneta · May 6, 2021, 6:59am

I can now reproduce the crash. I will investigate it…
Cheers

Lorenzo

moneta · May 6, 2021, 3:08pm

Actually I cannot reproduce the crash with ROOT >= 6.22 and 6.24 on Linux, but I can on MaOS.
I suspect there is an issue on the sorting of a large vector. Which operating system are you using ?

Lorenzo

andrea.celentano · May 6, 2021, 3:20pm

Hi,
I am using MacOS 11.2.3. I can try repeating this on Linux and confirm the issue is not present there.

moneta · May 6, 2021, 4:58pm

Hi,
I have made a PR implenting a workaround avoid the crash, see

github.com/root-project/root

Avoid self comparison when sorting RooAbsArg

root-project:master ← lmoneta:fix_RooAbsArg_sort

opened 04:57PM - 06 May 21 UTC

lmoneta

+2 -0

On MacOS this simple code using std::sort of RooAbsArg crashed because std::sort… MacOS implementation goes to element outside of the collection. This simple code reproduced the problem by crashing on MacOS ``` std::vector<RooAbsArg*> v; int n = 38; for (int i = 0; i < n; ++i) { TString name = TString::Format("a_%d",i); v.push_back( new RooRealVar (name, name, 10,0,20) ); } std::sort(v.begin(), v.end(), [](RooAbsArg*a, RooAbsArg*b) { return a->dependsOn(*b); } ); ``` The PR with the simple check avoid a crash observed running the example in https://root-forum.cern.ch/t/upper-limit-for-signal-strength-on-off-problem-multiple-independent-runs/44657

Lorenzo

andrea.celentano · May 10, 2021, 4:13pm

Hi @moneta,
I confirm this fix is working properly on MacOS.

I still have an issue with this code: if I increment too much the number of “runs” that I am considering (first number in the data.txt) file, I start to get non-sense results.
I am currently comparing the result of the Likelihood I wrote before with that of a simpler likelihood, simply obtained all the “beam-on” periods and, separately, all the “beam-off” periods - this is the standart Non/Noff problem. For the simpler likhelihood, I using the AsymptoticCalculator, while for the full model, I am using both the FrequentistCalculator and the AsymptoticCalculator.

If nRuns <= 4, the results I obtain for the full model are reasonable, with both calculators.
If nRuns == 5, instead, I start to see some problems. For example, the MINUIT output reports errors (see below). This suggests me that there may be some issue with my model?

 MIGRAD WILL VERIFY CONVERGENCE AND ERROR MATRIX.
 MINUIT WARNING IN HESSE
 ============== Second derivative zero for parameter1
  MNHESS FAILS AND WILL RETURN DIAGONAL MATRIX. 
 MIGRAD MINIMIZATION HAS CONVERGED.
 MIGRAD WILL VERIFY CONVERGENCE AND ERROR MATRIX.
 MINUIT WARNING IN HESSE
 ============== Second derivative zero for parameter1
  MNHESS FAILS AND WILL RETURN DIAGONAL MATRIX.

For completeness, I attach again my code to read data and produce the model. By the way, for the “nOFF” data, should these be declared as “global observables”? My guess is that this is not appropriate, in case I want to throw toys also for this for the FrequentistCalculator. I also attach the full error log in case nruns=5.

Thanks,
Andrea

poissonRuns.C (7.7 KB)
log.tar (123.8 KB)
poissonRuns_HipoInverter.C (7.9 KB)

andrea.celentano · May 10, 2021, 5:41pm

Dear all,
I think I found the issue to this problem. It was related to the initialization of the Bi nuisance parameters. I write down the solution in case it may be useful for other people.

The likelihood for this problem reads:

where nON and nOFF are the measured values of the ON and OFF counts, f is the luminosity fraction, S is the signal (POI) and Bi the background. tau is the ratio TOFF/TON.

Bi are the nuisance parameters, while tau is known without any uncertainty.

To find the values of Bi that maximizes the likelihood, one can take the logarithm of the latter, and compute the derivatives. The result is the set of equations:

In the limit of runs having all the same luminosity fraction, i.e. fi=1/Nruns, the solution is given by:

B_i=n_i^{off}/tau_i

The equation for S is more complicated:

In my code, I was performing a wrong initialization of B_i: instead of using n_i^{off}/tau_i, I was writing n_i^{off} * tau_i. This was creating issues in the search for the minimum.

With this correction, the code seems to work.

Bests,
Andea

system · May 24, 2021, 5:42pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

moneta · May 26, 2021, 4:09pm

Good you have found the solution. Thank you for posting it
Best regards

Lorenzo