Pdf for discrete variable

Hi all,

I’d like to write a pdf for a discrete random variable X.
As a concrete example, I’d like to write the pdf of X (otucomes are 0,1,2,3) under the form :
pdf f(x)
= p_0(a,b,c) if x=0,
= p_1(a,b,c) if x=1,
= p_2(a,b,c) if x=2,
= p_3(a,b,c) if x=3.
where a,b,c are some parameters and p_i, functions that I know analytically. (In fact, p_i is the sum of two pdfs (one for signal and one for background let’s say weighted with a factor f // (1-f) that represents the fraction of signal/background respectively.)

The main aim is to estimate a,b,c and f using a maximum likelihood estimation.
For performing a MLE, I guess that I can use the different tutorials/macros provided with RooFit, unless there are some subtleties I should be aware of. If this is the case, could you also let me know.

Many thanks in advance for your help.

Any help?

I guess this is not so complicated from the technical point of view but conceptually I didn’t manage to find any example on how to do it. And as I"m a RooFit expert…

Many thanks in advance for any comment/help/suggestion.

Hi again,

I’d like, at least, to know whether this is possible or not to implement such a pdf. If this is described somewhere in the RooFit manual, please let me know.

Thanks.

Hello,

For the description it seems to me that what you’re needing is to implement a RooSimulaneous PDF but I’m not too sure. Check the manual and roofit tutorials for many examples of the RooSimultaneous. Is X the only observable you have in your problem? Is p0(a,b,c) and the other such functions just constant terms? I had also the impression that your model could be implemented with a RooGenericPdf but maybe Wouter knows better and can comment.

Cheers,

– Gregory

Hi,

X is indeed the only variable but the p (whatever the name) are in fact quite complex function (polynomials both in a,b,c and linear in d and e ; I have five parameters in my model plus an additional one : signal fraction in the context of signal+background MLE).

I will have a look a RooSimultaneous… What about RooStepFunction? But I haven’t been able to plug the p functions into it to set the “height” of the bins?

Many thanks for your help.

Just to be sure I get it right : can I use RooSimultaneous even if the different parameters (that I try to estimate) are common to all the pdf’s ?
Can I also use RooAddPdf with two RooSimultaneous?

Many thanks in advance.

Hi all,

in every example that I’ve found so far on RooSimultaneous, a RooCategory is defined to index the different pdf’s. In my case, the discrete variable (the RooCategory) is the variable that I’d like to generate with these different pdf’s.

Ex : I’d like to do something like this ;
RooCategory cat(“cat”,“cat”);
cat.defineType(“cat0”, 0);
cat.defineType(“cat1”, 1);
cat.defineType(“cat2”, 2);

// C o n s t r u c t p . d . f 's
// -------------------------------------------------------------------------------

RooSimultaneous mypdf(“mypdf”, “”,cat);

// Construct pdf’s for p_i
RooGenericPdf p0(“p0”,“somefunction0”,RooArgList(a,b,c,d));
RooGenericPdf p1(“p1”,“somefunction1”,RooArgList(a,b,c,d));
RooGenericPdf p2(“p2”,“somefunction2”,RooArgList(a,b,c,d));

//Merge these pdf’s
mypdf.addPdf(p0,“cat0”);
mypdf.addPdf(p1,“cat1”);
mypdf.addPdf(p2,“cat2”);

// C r e a t e d a t a s e t
// -------------------------------------------------------------------------------

// Sample a dataset
RooDataSet* data = NbjetsPdf_tt.generate(nbjets,1000) ;

and plot the generated nbjets…

Is this possible?

Many thanks in advance.

Hi Yakko-San,

What you want is mostly possible: you can generate distributions
in discrete observables, also in RooSimultaneous pdfs, but there are couple of details that are important:

  • A RooSimultaneous pdf does not predict the relative abundance
    of its components (this is a feature), unless all component
    pdfs are extended pdfs.

    Thus to call generate() on a RooSimultaneous
    you either need to provide a prototype dataset that represents
    the desired relative abundance of each state in the generation call
    (throught a ProtoData(…) argument in the generate call), or make
    all the inputs extended pdfs. The latter can be trivially done by
    wrapping a pdf in a RooExtendedPdf that associates a variable
    (or function) as yield parameter with any given pdf

  • You cannot plot discrete observable distributions, but you can tabulate
    them, e.g.

     RooTable* table = data.table(myCategory)
    

    and then do table->Print()

    Plotting of discrete observables (along with the introduction of a new
    data type RooInteger) is foreseen for the next major ROOT release
    (summer)

Wouter

Hi again,

in order to build my pdf for a discrete random variable, I did the following :

(defintion of eb,euds,euds,Ntt,Nv…)

RooCategory nbjets(“nbjets”,“Number of b-jets”);
nbjets.defineType(“N0bjet”, 0);
nbjets.defineType(“N1bjet”, 1);
nbjets.defineType(“N2bjets”,2);
nbjets.defineType(“N3bjets”,3);

// C o n s t r u c t f o r m u l a s
// -------------------------------------------------------------------------------

RooFormulaVar frac1 (“frac1”, “frac1”, “Ntt/(Ntt+Nv)”, RooArgList(Ntt,Nv));
RooFormulaVar frac2 (“frac2”, “frac2”, “Nv/(Ntt+Nv)”, RooArgList(Ntt,Nv));

// C o n s t r u c t p . d . f 's
// -------------------------------------------------------------------------------

RooGenericPdf pbjets_tt(“pbjets_tt”,“pbjets_tt”,"(nbjets==0)*p0bjets_tt+(nbjets==1)*p1bjets_tt+(nbjets==2)*p2bjets_tt+(nbjets==3)*p3bjets_tt",RooArgList(nbjets,p0bjets_tt,p1bjets_tt,p2bjets_tt,p3bjets_tt));
RooExtendPdf pbjets_tt_ext(“pbjets_tt_ext”,“pbjets_tt_xt”,pbjets_tt,Ntt);

RooGenericPdf pbjets_v(“pbjets_v”,“pbjets_v”,"(nbjets==0)pow(1-euds,n)+(nbjets==1)neudspow(1-euds,n-1)+(nbjets==2)(n(n-1)/2)eudseudspow(1-euds,n-2)+(nbjets==3)((n)(n-1)(n-2)/6)*pow(euds,3)*pow(1-euds,n-3)",RooArgList(nbjets,n,euds));
RooExtendPdf pbjets_v_ext(“pbjets_v_ext”,“pbjets_v_xt”,pbjets_v,Nv);

RooAddPdf model(“model”,“model”,RooArgList(pbjets_tt_ext,pbjets_v_ext));

I’d like to know if there could be any potential problem in the normalisation of these generic pdf’s.

Many thanks in advance.

Hi,

RooGenericPdfs are normalized by dividing the function by its numeric integral. As long as you don’t have more than one continuous observable that integration should always be unproblematic.

Wouter

Hi Wouter,

thanks for the reply. The problem is that the first pdf is a function of 7 parameters (1 is a RooCategory, 4 are RooConstVar and 2 RooRealVar). Is there a way to be sure that the integration has been performed correctly?

Many thanks in advance.

Hi again,

I have a related question :
how should I interpret the following line returned by a “ws->Print()” :
RooAddPdf::model[ Nsig * pdf_sig + Nbck * pdf_bck ] = 0.369415
in the context of a composite model?

Thanks.

Hi,

The number of parameters is not important, as the pdf is not integrated over them. Whenever a parameter changes the normalization integrals is reevaluated, but the complexity of that task doesn’t scale with the number of parameters.

Concerning your second question, can you be more specific?
The print line indicates that your object is a RooAddPdf that represents the sum of two input pdfs, that are each multiplied with yield parameter to determine the relative weighting of the two. This information is independent on how the pdf is used.

Wouter

Hi Wouter,

thanks for your help and comments.

So, if I understand correctly, there is nothing to worry about for the normalisation of the pdf’s.

Concerning the line, I copy-pasted ; I just wanted to know if once the parameters of the system have been estimated via a LL and when saving the model into the workspace ws, this line is supposed to returned a fixed value (like 1). To be more clear, I try to check if my model is properly implemented because the pull distributions of the different parameters show widths which are by far too narrow (0.03 for instance!). Errors returned by MINOS and a Profile LL are way bigger than expected. Therefore the previous question and the one about the normalisation…