Dear RooStat experts/enthusiasts,

I have statystics problem to solve.

As it is rather standard, probably you could advise already available

simple and effective solution.

I am studying unbinned 1-D distribution. I have background and signal pdf’s for

this study. I’d like to set 95% C.L. exclusion limit on the signal presence

in my original distribution.

The brute force solution is to get minNll value for s+b fit to

the original data sample. Then fix some signal contribution, run toy experiments,

then again fit s+b model to them, and get fraction of results with minNll

less then original minNll.

Iterate this procedure to get such a signal contribution, that

obtained fraction of minNll’s less then original minNll is 5%.

The corresponding signal contribution is a desired 95% C.L. exclusion limit

for my data and model.

I have two technical problems with this procedure:

- my model is defined as

RooAddPdf model(“model”,“Lumi + Flat Bkg”,RooArgList (sigModel, flatBkg), RooArgList (fsignal, fbkg)) ;

fsignal and fbkg then represent number of signal and background events in

the distribution. However it looks like the fit also constrains total contribution

fsignal+fbkg to the total # of events in the original distribution.

How can I set up model and/or fit to take into account data points only,

without any explicit constrain on the normalization?
- my pdf’s have very fine structure, “generate” functions for them are rather

time consuming. Is there any trick basing on the "Science of Statistics"

to substitute some of MC iterations above with analytical calculations and speed up

the entire procedure?

Many thanks!

-Fedor

Hi Fedor,

Here are some answers:

- If you only want to model the fraction you should write your

model as follows

RooAddPdf model(“model”,“Lumi + Flat Bkg”,RooArgList (sigModel, flatBkg), fsignal) ;

i.e. omit fbackground, which will be implicitly defined as 1-fsignal.

- RooFit pdfs implement a variety of techniques to speed up generation,

but it depends on the pdf. If you provide some more detail on what

pdfs you use I can perhaps be more specific.

Wouter

Hi Wouter,

(1) I do want to model the number of events rather than fraction, but I want my fit to minimize -Ln(L) by varying both fsignal and fbkg independently, rather than with

constraint fsig+nbkg=ntotal. Can I setup model/fit to do this?

(2) From PDF side I can hardly optimize my “generate” function more. My question (and hope) was if I can use smart techniques, for example by extracting analytical dependency

of NLL distribution tail from parameter variations, so I could minimize # of required toy experiments…

Thanks!

-Fedor

Hi,

I’m still not sure I understand what you’d like to do.

The model does not impose a hard constraint that fsig+fbkg=Ntot,

both parameters can vary.

However in an extended ML fit, a term -log(Poisson(Ntot,fsig+fbkg)) is added to the likelihood that makes it effectively come out that way.

Wouter

OK, I guess I understand what’s going on.

When doing unbinned fit to extended pdf, the shape fit and

the normalization fit are actually independent,

thus the later always fits the total number of events.

The different normalization may appear only in case

if normalization parameter (fsig+fbckg) is correlated with pdf shape parameters. Only then total normalization could be pulled out from

the distribution integral by the shape fit. I guess it is rather exotic use case,

so the normalization of the fit result to total number of events

should be expected for most cases indeed.

-Fedor