# Pdf for discrete variable

Hi all,

I’d like to write a pdf for a discrete random variable X.
As a concrete example, I’d like to write the pdf of X (otucomes are 0,1,2,3) under the form :
pdf f(x)
= p_0(a,b,c) if x=0,
= p_1(a,b,c) if x=1,
= p_2(a,b,c) if x=2,
= p_3(a,b,c) if x=3.
where a,b,c are some parameters and p_i, functions that I know analytically. (In fact, p_i is the sum of two pdfs (one for signal and one for background let’s say weighted with a factor f // (1-f) that represents the fraction of signal/background respectively.)

The main aim is to estimate a,b,c and f using a maximum likelihood estimation.
For performing a MLE, I guess that I can use the different tutorials/macros provided with RooFit, unless there are some subtleties I should be aware of. If this is the case, could you also let me know.

Any help?

I guess this is not so complicated from the technical point of view but conceptually I didn’t manage to find any example on how to do it. And as I"m a RooFit expert…

Many thanks in advance for any comment/help/suggestion.

Hi again,

I’d like, at least, to know whether this is possible or not to implement such a pdf. If this is described somewhere in the RooFit manual, please let me know.

Thanks.

Hello,

For the description it seems to me that what you’re needing is to implement a RooSimulaneous PDF but I’m not too sure. Check the manual and roofit tutorials for many examples of the RooSimultaneous. Is X the only observable you have in your problem? Is p0(a,b,c) and the other such functions just constant terms? I had also the impression that your model could be implemented with a RooGenericPdf but maybe Wouter knows better and can comment.

Cheers,

– Gregory

Hi,

X is indeed the only variable but the p (whatever the name) are in fact quite complex function (polynomials both in a,b,c and linear in d and e ; I have five parameters in my model plus an additional one : signal fraction in the context of signal+background MLE).

I will have a look a RooSimultaneous… What about RooStepFunction? But I haven’t been able to plug the p functions into it to set the “height” of the bins?

Just to be sure I get it right : can I use RooSimultaneous even if the different parameters (that I try to estimate) are common to all the pdf’s ?
Can I also use RooAddPdf with two RooSimultaneous?

Hi all,

in every example that I’ve found so far on RooSimultaneous, a RooCategory is defined to index the different pdf’s. In my case, the discrete variable (the RooCategory) is the variable that I’d like to generate with these different pdf’s.

Ex : I’d like to do something like this ;
RooCategory cat(“cat”,“cat”);
cat.defineType(“cat0”, 0);
cat.defineType(“cat1”, 1);
cat.defineType(“cat2”, 2);

// C o n s t r u c t p . d . f 's
// -------------------------------------------------------------------------------

RooSimultaneous mypdf(“mypdf”, “”,cat);

// Construct pdf’s for p_i
RooGenericPdf p0(“p0”,“somefunction0”,RooArgList(a,b,c,d));
RooGenericPdf p1(“p1”,“somefunction1”,RooArgList(a,b,c,d));
RooGenericPdf p2(“p2”,“somefunction2”,RooArgList(a,b,c,d));

//Merge these pdf’s

// C r e a t e d a t a s e t
// -------------------------------------------------------------------------------

// Sample a dataset
RooDataSet* data = NbjetsPdf_tt.generate(nbjets,1000) ;

and plot the generated nbjets…

Is this possible?

Hi Yakko-San,

What you want is mostly possible: you can generate distributions
in discrete observables, also in RooSimultaneous pdfs, but there are couple of details that are important:

• A RooSimultaneous pdf does not predict the relative abundance
of its components (this is a feature), unless all component
pdfs are extended pdfs.

Thus to call generate() on a RooSimultaneous
you either need to provide a prototype dataset that represents
the desired relative abundance of each state in the generation call
(throught a ProtoData(…) argument in the generate call), or make
all the inputs extended pdfs. The latter can be trivially done by
wrapping a pdf in a RooExtendedPdf that associates a variable
(or function) as yield parameter with any given pdf

• You cannot plot discrete observable distributions, but you can tabulate
them, e.g.

`````` RooTable* table = data.table(myCategory)
``````

and then do table->Print()

Plotting of discrete observables (along with the introduction of a new
data type RooInteger) is foreseen for the next major ROOT release
(summer)

Wouter

Hi again,

in order to build my pdf for a discrete random variable, I did the following :

(defintion of eb,euds,euds,Ntt,Nv…)

RooCategory nbjets(“nbjets”,“Number of b-jets”);
nbjets.defineType(“N0bjet”, 0);
nbjets.defineType(“N1bjet”, 1);
nbjets.defineType(“N2bjets”,2);
nbjets.defineType(“N3bjets”,3);

// C o n s t r u c t f o r m u l a s
// -------------------------------------------------------------------------------

RooFormulaVar frac1 (“frac1”, “frac1”, “Ntt/(Ntt+Nv)”, RooArgList(Ntt,Nv));
RooFormulaVar frac2 (“frac2”, “frac2”, “Nv/(Ntt+Nv)”, RooArgList(Ntt,Nv));

// C o n s t r u c t p . d . f 's
// -------------------------------------------------------------------------------

RooGenericPdf pbjets_tt(“pbjets_tt”,“pbjets_tt”,"(nbjets==0)*p0bjets_tt+(nbjets==1)*p1bjets_tt+(nbjets==2)*p2bjets_tt+(nbjets==3)*p3bjets_tt",RooArgList(nbjets,p0bjets_tt,p1bjets_tt,p2bjets_tt,p3bjets_tt));
RooExtendPdf pbjets_tt_ext(“pbjets_tt_ext”,“pbjets_tt_xt”,pbjets_tt,Ntt);

RooGenericPdf pbjets_v(“pbjets_v”,“pbjets_v”,"(nbjets==0)pow(1-euds,n)+(nbjets==1)neudspow(1-euds,n-1)+(nbjets==2)(n(n-1)/2)eudseudspow(1-euds,n-2)+(nbjets==3)((n)(n-1)(n-2)/6)*pow(euds,3)*pow(1-euds,n-3)",RooArgList(nbjets,n,euds));
RooExtendPdf pbjets_v_ext(“pbjets_v_ext”,“pbjets_v_xt”,pbjets_v,Nv);

I’d like to know if there could be any potential problem in the normalisation of these generic pdf’s.

Hi,

RooGenericPdfs are normalized by dividing the function by its numeric integral. As long as you don’t have more than one continuous observable that integration should always be unproblematic.

Wouter

Hi Wouter,

thanks for the reply. The problem is that the first pdf is a function of 7 parameters (1 is a RooCategory, 4 are RooConstVar and 2 RooRealVar). Is there a way to be sure that the integration has been performed correctly?

Hi again,

I have a related question :
how should I interpret the following line returned by a “ws->Print()” :
RooAddPdf::model[ Nsig * pdf_sig + Nbck * pdf_bck ] = 0.369415
in the context of a composite model?

Thanks.

Hi,

The number of parameters is not important, as the pdf is not integrated over them. Whenever a parameter changes the normalization integrals is reevaluated, but the complexity of that task doesn’t scale with the number of parameters.

Concerning your second question, can you be more specific?
The print line indicates that your object is a RooAddPdf that represents the sum of two input pdfs, that are each multiplied with yield parameter to determine the relative weighting of the two. This information is independent on how the pdf is used.

Wouter

Hi Wouter,