Q:Exclusion limit search with untrivial pdfs

Dear RooStat experts/enthusiasts,
I have statystics problem to solve.
As it is rather standard, probably you could advise already available
simple and effective solution.

I am studying unbinned 1-D distribution. I have background and signal pdf’s for
this study. I’d like to set 95% C.L. exclusion limit on the signal presence
in my original distribution.
The brute force solution is to get minNll value for s+b fit to
the original data sample. Then fix some signal contribution, run toy experiments,
then again fit s+b model to them, and get fraction of results with minNll
less then original minNll.
Iterate this procedure to get such a signal contribution, that
obtained fraction of minNll’s less then original minNll is 5%.
The corresponding signal contribution is a desired 95% C.L. exclusion limit
for my data and model.

I have two technical problems with this procedure:

  • my model is defined as
    RooAddPdf model(“model”,“Lumi + Flat Bkg”,RooArgList (sigModel, flatBkg), RooArgList (fsignal, fbkg)) ;
    fsignal and fbkg then represent number of signal and background events in
    the distribution. However it looks like the fit also constrains total contribution
    fsignal+fbkg to the total # of events in the original distribution.
    How can I set up model and/or fit to take into account data points only,
    without any explicit constrain on the normalization?
  • my pdf’s have very fine structure, “generate” functions for them are rather
    time consuming. Is there any trick basing on the "Science of Statistics"
    to substitute some of MC iterations above with analytical calculations and speed up
    the entire procedure?

Many thanks!
-Fedor

Hi Fedor,

Here are some answers:

  1. If you only want to model the fraction you should write your
    model as follows

RooAddPdf model(“model”,“Lumi + Flat Bkg”,RooArgList (sigModel, flatBkg), fsignal) ;

i.e. omit fbackground, which will be implicitly defined as 1-fsignal.

  1. RooFit pdfs implement a variety of techniques to speed up generation,
    but it depends on the pdf. If you provide some more detail on what
    pdfs you use I can perhaps be more specific.

Wouter

Hi Wouter,
(1) I do want to model the number of events rather than fraction, but I want my fit to minimize -Ln(L) by varying both fsignal and fbkg independently, rather than with
constraint fsig+nbkg=ntotal. Can I setup model/fit to do this?
(2) From PDF side I can hardly optimize my “generate” function more. My question (and hope) was if I can use smart techniques, for example by extracting analytical dependency
of NLL distribution tail from parameter variations, so I could minimize # of required toy experiments…

Thanks!
-Fedor

Hi,

I’m still not sure I understand what you’d like to do.

The model does not impose a hard constraint that fsig+fbkg=Ntot,
both parameters can vary.

However in an extended ML fit, a term -log(Poisson(Ntot,fsig+fbkg)) is added to the likelihood that makes it effectively come out that way.

Wouter

OK, I guess I understand what’s going on.
When doing unbinned fit to extended pdf, the shape fit and
the normalization fit are actually independent,
thus the later always fits the total number of events.
The different normalization may appear only in case
if normalization parameter (fsig+fbckg) is correlated with pdf shape parameters. Only then total normalization could be pulled out from
the distribution integral by the shape fit. I guess it is rather exotic use case,
so the normalization of the fit result to total number of events
should be expected for most cases indeed.
-Fedor