Pdf for discrete variable

Yakko-San · March 2, 2010, 2:03pm

Hi all,

I’d like to write a pdf for a discrete random variable X.
As a concrete example, I’d like to write the pdf of X (otucomes are 0,1,2,3) under the form :
pdf f(x)
= p_0(a,b,c) if x=0,
= p_1(a,b,c) if x=1,
= p_2(a,b,c) if x=2,
= p_3(a,b,c) if x=3.
where a,b,c are some parameters and p_i, functions that I know analytically. (In fact, p_i is the sum of two pdfs (one for signal and one for background let’s say weighted with a factor f // (1-f) that represents the fraction of signal/background respectively.)

The main aim is to estimate a,b,c and f using a maximum likelihood estimation.
For performing a MLE, I guess that I can use the different tutorials/macros provided with RooFit, unless there are some subtleties I should be aware of. If this is the case, could you also let me know.

Many thanks in advance for your help.

Yakko-San · March 3, 2010, 2:28pm

Any help?

I guess this is not so complicated from the technical point of view but conceptually I didn’t manage to find any example on how to do it. And as I"m a RooFit expert…

Many thanks in advance for any comment/help/suggestion.

Yakko-San · March 4, 2010, 12:14pm

Hi again,

I’d like, at least, to know whether this is possible or not to implement such a pdf. If this is described somewhere in the RooFit manual, please let me know.

Thanks.

gschott · March 4, 2010, 9:42pm

Hello,

For the description it seems to me that what you’re needing is to implement a RooSimulaneous PDF but I’m not too sure. Check the manual and roofit tutorials for many examples of the RooSimultaneous. Is X the only observable you have in your problem? Is p0(a,b,c) and the other such functions just constant terms? I had also the impression that your model could be implemented with a RooGenericPdf but maybe Wouter knows better and can comment.

Cheers,

– Gregory

Yakko-San · March 4, 2010, 10:37pm

Hi,

X is indeed the only variable but the p (whatever the name) are in fact quite complex function (polynomials both in a,b,c and linear in d and e ; I have five parameters in my model plus an additional one : signal fraction in the context of signal+background MLE).

I will have a look a RooSimultaneous… What about RooStepFunction? But I haven’t been able to plug the p functions into it to set the “height” of the bins?

Many thanks for your help.

Yakko-San · March 4, 2010, 11:15pm

Just to be sure I get it right : can I use RooSimultaneous even if the different parameters (that I try to estimate) are common to all the pdf’s ?
Can I also use RooAddPdf with two RooSimultaneous?

Many thanks in advance.

Yakko-San · March 5, 2010, 10:29am

Hi all,

in every example that I’ve found so far on RooSimultaneous, a RooCategory is defined to index the different pdf’s. In my case, the discrete variable (the RooCategory) is the variable that I’d like to generate with these different pdf’s.

Ex : I’d like to do something like this ;
RooCategory cat(“cat”,“cat”);
cat.defineType(“cat0”, 0);
cat.defineType(“cat1”, 1);
cat.defineType(“cat2”, 2);
…
// C o n s t r u c t p . d . f 's
// -------------------------------------------------------------------------------

RooSimultaneous mypdf(“mypdf”, “”,cat);

// Construct pdf’s for p_i
RooGenericPdf p0(“p0”,“somefunction0”,RooArgList(a,b,c,d));
RooGenericPdf p1(“p1”,“somefunction1”,RooArgList(a,b,c,d));
RooGenericPdf p2(“p2”,“somefunction2”,RooArgList(a,b,c,d));

//Merge these pdf’s
mypdf.addPdf(p0,“cat0”);
mypdf.addPdf(p1,“cat1”);
mypdf.addPdf(p2,“cat2”);

// C r e a t e d a t a s e t
// -------------------------------------------------------------------------------

// Sample a dataset
RooDataSet* data = NbjetsPdf_tt.generate(nbjets,1000) ;

and plot the generated nbjets…

Is this possible?

Many thanks in advance.

Wouter_Verkerke · March 9, 2010, 5:27pm

Hi Yakko-San,

What you want is mostly possible: you can generate distributions
in discrete observables, also in RooSimultaneous pdfs, but there are couple of details that are important:

A RooSimultaneous pdf does not predict the relative abundance
of its components (this is a feature), unless all component
pdfs are extended pdfs.

Thus to call generate() on a RooSimultaneous
you either need to provide a prototype dataset that represents
the desired relative abundance of each state in the generation call
(throught a ProtoData(…) argument in the generate call), or make
all the inputs extended pdfs. The latter can be trivially done by
wrapping a pdf in a RooExtendedPdf that associates a variable
(or function) as yield parameter with any given pdf
You cannot plot discrete observable distributions, but you can tabulate
them, e.g.
```
 RooTable* table = data.table(myCategory)
```
and then do table->Print()

Plotting of discrete observables (along with the introduction of a new
data type RooInteger) is foreseen for the next major ROOT release
(summer)

Wouter

Yakko-San · March 15, 2010, 4:16pm

Hi again,

in order to build my pdf for a discrete random variable, I did the following :

(defintion of eb,euds,euds,Ntt,Nv…)

RooCategory nbjets(“nbjets”,“Number of b-jets”);
nbjets.defineType(“N0bjet”, 0);
nbjets.defineType(“N1bjet”, 1);
nbjets.defineType(“N2bjets”,2);
nbjets.defineType(“N3bjets”,3);

// C o n s t r u c t f o r m u l a s
// -------------------------------------------------------------------------------

RooFormulaVar frac1 (“frac1”, “frac1”, “Ntt/(Ntt+Nv)”, RooArgList(Ntt,Nv));
RooFormulaVar frac2 (“frac2”, “frac2”, “Nv/(Ntt+Nv)”, RooArgList(Ntt,Nv));

// C o n s t r u c t p . d . f 's
// -------------------------------------------------------------------------------

RooGenericPdf pbjets_tt(“pbjets_tt”,“pbjets_tt”,"(nbjets==0)*p0bjets_tt+(nbjets==1)*p1bjets_tt+(nbjets==2)*p2bjets_tt+(nbjets==3)*p3bjets_tt",RooArgList(nbjets,p0bjets_tt,p1bjets_tt,p2bjets_tt,p3bjets_tt));
RooExtendPdf pbjets_tt_ext(“pbjets_tt_ext”,“pbjets_tt_xt”,pbjets_tt,Ntt);

RooGenericPdf pbjets_v(“pbjets_v”,“pbjets_v”,"(nbjets==0)pow(1-euds,n)+(nbjets==1)neudspow(1-euds,n-1)+(nbjets==2)(n(n-1)/2)eudseudspow(1-euds,n-2)+(nbjets==3)((n)(n-1)(n-2)/6)*pow(euds,3)*pow(1-euds,n-3)",RooArgList(nbjets,n,euds));
RooExtendPdf pbjets_v_ext(“pbjets_v_ext”,“pbjets_v_xt”,pbjets_v,Nv);

RooAddPdf model(“model”,“model”,RooArgList(pbjets_tt_ext,pbjets_v_ext));

I’d like to know if there could be any potential problem in the normalisation of these generic pdf’s.

Many thanks in advance.

Wouter_Verkerke · March 16, 2010, 8:57pm

Hi,

RooGenericPdfs are normalized by dividing the function by its numeric integral. As long as you don’t have more than one continuous observable that integration should always be unproblematic.

Wouter

Yakko-San · March 16, 2010, 11:02pm

Hi Wouter,

thanks for the reply. The problem is that the first pdf is a function of 7 parameters (1 is a RooCategory, 4 are RooConstVar and 2 RooRealVar). Is there a way to be sure that the integration has been performed correctly?

Many thanks in advance.

Yakko-San · March 17, 2010, 2:44pm

Hi again,

I have a related question :
how should I interpret the following line returned by a “ws->Print()” :
RooAddPdf::model[ Nsig * pdf_sig + Nbck * pdf_bck ] = 0.369415
in the context of a composite model?

Thanks.

Wouter_Verkerke · March 17, 2010, 8:37pm

Hi,

The number of parameters is not important, as the pdf is not integrated over them. Whenever a parameter changes the normalization integrals is reevaluated, but the complexity of that task doesn’t scale with the number of parameters.

Concerning your second question, can you be more specific?
The print line indicates that your object is a RooAddPdf that represents the sum of two input pdfs, that are each multiplied with yield parameter to determine the relative weighting of the two. This information is independent on how the pdf is used.

Wouter

Yakko-San · March 17, 2010, 10:48pm

Hi Wouter,

thanks for your help and comments.

So, if I understand correctly, there is nothing to worry about for the normalisation of the pdf’s.

Concerning the line, I copy-pasted ; I just wanted to know if once the parameters of the system have been estimated via a LL and when saving the model into the workspace ws, this line is supposed to returned a fixed value (like 1). To be more clear, I try to check if my model is properly implemented because the pull distributions of the different parameters show widths which are by far too narrow (0.03 for instance!). Errors returned by MINOS and a Profile LL are way bigger than expected. Therefore the previous question and the one about the normalisation…