Roofit Binomial term in pdf

will_cern · December 15, 2012, 1:08pm

Hi,

I have been trying to implement a pdf model that includes a binomial term to account for the unknown (but constrained) selection efficiency of an analysis that has been estimated with a finite size MC dataset…

Lets suppose I have an MC sample of 1million events, where my selection cuts give 900,000 events passing (90% efficiency). If this was part of a cross-section measurement, I might construct a likelihood as:

L = Pois(Nobs| Lsigmaeff)Gaus(eff|eff0,eff_err)

Where eff0 = 0.9, and eff_err could be my estimate for the uncertainty of the efficiency, which could just be the binomial error or something like that in this case. In this my “space point” is simply the number of observed events, Nobs.

So what I wanted to do was to do something more sophisticated: Make my “space point” Nobs and the number of selected events from the MC dataset (lets call it “n”). Then my likelihood would look like:

L = Pois(Nobs| Lsigmaeff)Binomial(n|N,eff)

Where N is a constant 1million (the number of trials) and n is 900,000 in my dataset.

So my problem is there doesn’t seem to be a way with the Roofit workspace factory to create this binomial term for my pdf. Technically I would also want the “n” variable to be a discrete, integer variable. But RooRealVar nor RooCategory seem appropriate for this sort of thing.

So is this at all possible with Roofit?

Thanks,
Will

moneta · January 8, 2013, 5:08pm

Hi,

You can use probably the RooEfficiency class which implements practically a Bernoulli pdf. You can then use it on a dataset containing N events with a category specifying if the event is accepted or rejected.
Alternatively, (probably easier and more efficient) you can create yourself your own Binomial pdf in a similar way as it is done with RooPoisson. You can ue RooRealVar to express the parameter, but it is important you define also the normalization integral, to avoid that RooFit computes a numerical integral using the discrete observable as continuos.

Best Regards

Lorenzo

bomben · November 26, 2020, 3:22pm

Dear @moneta,
I would like to understand better how to 'define (also) the normalization integral. Could you please provide an example?

Here is the pdf I have defined:

  // build pdf for Ip
  RooGenericPdf IpPdf("IpPdf","Binomial Pdf for incidence","TMath::Binomial(Np,Ip)*TMath::Power(inc,Ip)*TMath::Power((1-inc),Np-Ip)",RooArgList(inc,Np,Ip));
  RooAbsReal* intOfIpPdf = IpPdf.createIntegral(Ip);

What should I do to define the normalization integral, please?

Many thanks in advance and kind regards,
Marco Bomben

PS If you prefer I can create a new topic

moneta · November 26, 2020, 3:53pm

Hi,

Yes this should be fine. An example is here

If you have problems using it please let me know by posting your code

Best regards

Lorenzo

bomben · November 26, 2020, 4:38pm

Hello,
I have tried but it is not working as I get a lot of error messages like these:

message : p.d.f value is Not-a-Number (nan), forcing value to zero
server values: actualVars=(eps = 0.547386 +/- 0.192749,Ip = 6330.93,Iv = 351.563)
[#0] ERROR:Eval – RooAbsReal::logEvalError(IvPdf) evaluation error,
origin : RooGenericPdf::IvPdf[ actualVars=(eps,Ip,Iv) formula=“TMath::Binomial(x[1],(x[1]-x[2]))*TMath::Power(x[0],(x[1]-x[2]))*TMath::Power((1-x[0]),x[2])” ]

I join here the macro vaccine_PL.C; you can run it like this:

root [0] .L vaccine_PL.C 
root [1] vaccine_PL()

What I am trying to do is the following: I am trying to do a profile likelihood analysis of a model composed of the product of two binomials. There is one parameter of interest (“eps”) and one nuisance parameter (“inc”).
The two binomials are “linked” in the following way:
for Iv trials I get (Ip-Iv) success and hence the estimator for “eps” is (Ip-Iv)/Ip;
for Np trials I get Iv success and hence the the estimator for “inc” is Ip/Np.
I join a formulation of the model: vaccine_PL.pdf; epsilon is “eps” and p is “inc”.

It is like if you apply subsequent selection: on Np you select Ip with efficiency “inc” and then on Ip you select “Ip-Iv” with efficiency “eps”.

Many thanks in advance and kind regards,
Marco Bomben