Differential Unbinned Likelihood-fit

hym · August 28, 2020, 8:13am

Hello,

I want to do the Unbinned Maximum Likelihood fit. I have a problem in the following format -

There are 2 models, which estimates ‘f’ fraction. In other words, data will be described by 2 different models.

Model_A(x) = f . PDF_1(x) + (1-f) PDF_2(x)
Model_B(x) = f . PDF_3(x) + (1-f) PDF_4(x)

In each entry there is a column ‘y’. Based on the input (y), it is determined that which model A or B will be fitted to the data.

if y==1 then Model_A->fitTo(data);
if y!=1 then Model_B->fitTo(data);

PDF1 and PDF2 are a good description of data if input parameter y==1, else PDF3 or PDF4 are best.

So I can construct a Grand-model like this :

Grand_Model(x) = Turn_ON_A.Model_A(x) + Turn_ON_B.Model_B(x)

But It should be checked for each value of ‘y’ in each entry, and do the following :

if y==1 Turn_ON_A = 1; Turn_ON_B = 0;
if y!=1 Turn_ON_B = 1; Turn_ON_A = 0;
Grand_Model->fitTo(data);
// So this would select right model to fit.

In the end, I would get a single ‘f’-fraction value.

But I have doubt implementing this plan, Is it doable in RooFit with the entry-by-entry approach as I mentioned?

Thank you!

StephanH · August 28, 2020, 8:42am

Hi @hym,

I think that can work like this:

//define modelA
//define modelB
RooGenericPdf grandModel("grandModel", "(y == 1.)*modelA + (y != 1)*modelB", RooArgList(y, modelA, modelB));

The problem is the integral of that PDF. RooFit has to run a numeric integrator to integrate this, but it will only ever see modelB if y has to be integrated over. That’s because y will be scanned from it’s low to high limit, and the likelihood that it will hit the point where y == 1. is quite low.

So tell me:

What variables are in the dataset? Will RooFit have to integrate over y?
Is y needed for anything else than for selecting which model to fit?
- If no, you can make y a category variable, and run a simultaneous fit of the A and B categories.
- If yes, one might still be able to do it, but it’s a bit more complicated.

hym · August 28, 2020, 9:35am

Thank you @StephanH for the reply

So I am telling you all the details here (Reality! ).

First of all my real model has to check more than one variable and it has to fit in 2-dimensions.

My Dataset is like this :

RooDataSet [ m, x, y ] => Three-dimensional dataset

My model is 2-dimensional (which fits m and x) like this :

Model (m,x) = fsig . Msig(m) . Fsig(x) + (1-fsig) . Mbkg(m) . Fbkg(x)
where Fsig(x) = f . Fb(x) + (1-f) . R(x)
There are total 5-pdfs in the model. And you can see it depends only on either m or x.

I have already implemented my code until here, but now I have to modify it in the following way.

I want to put two-selections here -

Selection based on y –
I don’t have to fit over ‘y’. It is just used as a selection-flag for PDFs. It has value either 1 or 2. (Answer to your question)
y=1 or y=2 will decide, which R(x) to be used. I have two choices in R(x) { lets say R1(x) & R2(x) }
YES, RooFit has to check the value of ‘y’ in each entry before it fits ‘x’ and make a selection from R1(x) or R2(x).
Selection based on M –
(I didn’t mention it before, because I was only interested in the feasibility of this approach.)
I have to make a selection of Fbkg(x) on the basis of ‘m’, which is our fitting variable also.
There are 3-choices of Fbkg(x) = {Fbkg1(x), Fbkg2(x), Fbkg3(x)}

If (2.2 < m < 2.6) Fbkg(x) = Fbkg1(x)
If (2.6 < m < 3.2) Fbkg(x) = Fbkg2(x)
If (3.2 < m < 4.0) Fbkg(x) = Fbkg3(x)
// Range of m is [2.2 - 4.0] and fully covered in three choices.
// I have all PDFs ready.

Here RooFit has to check the m-value of the current entry and determine in which m-range it lies then it has to make one choice out of 3-Fbkg(x) and then fit it.

In the end, I would get two-parameters from this whole fitting procedure, which are f and fsig.

StephanH · August 28, 2020, 9:50am

This is a clear case of categories and a simultaneous PDF!

Create categories for the different cases. Make one category for the switch between model A and B, and a second one for the switch between Fbkg[123].
See a tutorial about categories here.
Add the category labels to the dataset. rf405 shows how to convert real-valued variables to category numbers. Use this to create the category states from y and from m.
Create a RooSuperCategory that gives each state of the combination {yState, mState} a unique label. rf406 shows how to do this.
Create a RooSimultaneous with the 6 different states. rf501 shows how to do this. The category that tells the simultaneous what to do is the supercategory that does the yState * mState combination.
For each of the 6 states, explicitly write down the model you want, i.e., use modelA with Fbkg1 or modelB with Fbkg2, … . RooFit will know how to properly integrate each one of them, and then they will be combined by the simultaneous PDF.
This would roughly look like:

RooSimultaneous fullModel(..., ..., theSuperCat);
yCat.setLabel("A");
mCat.setIndex(1);
fullModel.addPdf(theA1Pdf, theSuperCat.getLabel());
yCat.setLabel("B");
mCat.setIndex(1);
fullModel.addPdf(theB1Pdf, theSuperCat.getLabel());
yCat.setLabel("A");
mCat.setIndex(2);
fullModel.addPdf(theA2Pdf, theSuperCat.getLabel());

hym · September 10, 2020, 7:13pm

Thank you, @StephanH for this perfect solution. It is working well. I have done something like this :

// Components of Simultan. PDF
Model_FF_SB1 = new RooAddPdf(“Model_FF_SB1”, “Model_FF_SB1”, RooArgSet(Signal_FF, Bkg_FF_SB1), RooArgList(fsig));
Model_FF_SB2 = new RooAddPdf(“Model_FF_SB2”, “Model_FF_SB2”, RooArgSet(Signal_FF, Bkg_FF_SB2), RooArgList(fsig));
Model_FF_SB3 = new RooAddPdf(“Model_FF_SB3”, “Model_FF_SB3”, RooArgSet(Signal_FF, Bkg_FF_SB3), RooArgList(fsig));
Model_FF_SB4 = new RooAddPdf(“Model_FF_SB4”, “Model_FF_SB4”, RooArgSet(Signal_FF, Bkg_FF_SB4), RooArgList(fsig));
Model_FF_Sig = new RooAddPdf(“Model_FF_Sig”, “Model_FF_Sig”, RooArgSet(Signal_FF, Bkg_FF_Sig), RooArgList(fsig));
Model_FS_SB1 = new RooAddPdf(“Model_FS_SB1”, “Model_FS_SB1”, RooArgSet(Signal_FS, Bkg_FS_SB1), RooArgList(fsig));
Model_FS_SB2 = new RooAddPdf(“Model_FS_SB2”, “Model_FS_SB2”, RooArgSet(Signal_FS, Bkg_FS_SB2), RooArgList(fsig));
Model_FS_SB3 = new RooAddPdf(“Model_FS_SB3”, “Model_FS_SB3”, RooArgSet(Signal_FS, Bkg_FS_SB3), RooArgList(fsig));
Model_FS_SB4 = new RooAddPdf(“Model_FS_SB4”, “Model_FS_SB4”, RooArgSet(Signal_FS, Bkg_FS_SB4), RooArgList(fsig));
Model_FS_Sig = new RooAddPdf(“Model_FS_Sig”, “Model_FS_Sig”, RooArgSet(Signal_FS, Bkg_FS_Sig), RooArgList(fsig));

Full Simul. PDF model is here :

   RooSimultaneous fullModel("fullModel","fullModel", massXtypeCat);

    mCat->setLabel("SB1");   sCat->setLabel("FF");    fullModel.addPdf(*Model_FF_SB1, massXtypeCat.getLabel()); 
    mCat->setLabel("SB2");   sCat->setLabel("FF");    fullModel.addPdf(*Model_FF_SB2, massXtypeCat.getLabel());
    mCat->setLabel("SB3");   sCat->setLabel("FF");    fullModel.addPdf(*Model_FF_SB3, massXtypeCat.getLabel());
    mCat->setLabel("SB4");   sCat->setLabel("FF");    fullModel.addPdf(*Model_FF_SB4, massXtypeCat.getLabel());
    mCat->setLabel("Sig");   sCat->setLabel("FF");    fullModel.addPdf(*Model_FF_Sig, massXtypeCat.getLabel());

    mCat->setLabel("SB1");   sCat->setLabel("FS");    fullModel.addPdf(*Model_FS_SB1, massXtypeCat.getLabel());
    mCat->setLabel("SB2");   sCat->setLabel("FS");    fullModel.addPdf(*Model_FS_SB2, massXtypeCat.getLabel());
    mCat->setLabel("SB3");   sCat->setLabel("FS");    fullModel.addPdf(*Model_FS_SB3, massXtypeCat.getLabel());
    mCat->setLabel("SB4");   sCat->setLabel("FS");    fullModel.addPdf(*Model_FS_SB4, massXtypeCat.getLabel());
    mCat->setLabel("Sig");   sCat->setLabel("FS");    fullModel.addPdf(*Model_FS_Sig, massXtypeCat.getLabel());

Fitting it on the data :

fullModel.fitTo(*data);

It works well But could you please tell how can I project the PDF components (combined components) for ex. I have 10 {Bkg_F*_SB*} pdfs in my model so can I project a single component merging all these 10 pdf components. I want to do the same for {Signal_F*} and they are only two.

So if I manage to do this then finally I will end up with only two components : one for Signal and another for bkg.

Hope there is a way.!

Thanks !

StephanH · September 11, 2020, 9:13am

Hi @hym,

with categories, it maybe doesn’t make sense to project. You can do it technically, I guess, but how do you want to combine the categories? Do you just want to sum them? This only works if it’s the same observable in each category. If it’s the same observable, though, why are they in different categories?

Note also that signal and background should in general not be different categories. They should be components of a sum, i.e. RooAddPdf("name", "title", RooArgSet(sig, bkg), RooArgSet(f_sig, f_bkg)) or similar. You can have a sum like this in each category, though.
Check tutorial rf501 again. They have a signal+background model (gx + px), but both components are present in the category physics.

Let’s assume, now, that signal and background are in the right place everywhere, and that the different category states all share the same observable, so we can just sum them. You have two options:

1. Sum the plotted curves

You can project out each component of the simultaneous, and add it to an existing curve. That would roughly look like:

// We are projecting out the `physics` component, and give its curve a name so we can reference it later:
pdf.plotOn(frame, Slice(sample, "physics"), ProjWData(sample, combData), RooFit::Name("cat1Curve1"));
// Now get the second component, and add it to the existing curve:
pdf.plotOn(frame, Slice(sample, "control"), ProjWData(sample, combData), RooFit::AddTo("cat1Curve1"), RooFit::Name("cat1Curve2"));

If you don’t want to see each intermediate curve, make it Invisible().
The reason you need the ProjWData, the data-weighted projection, is that RooFit has to read off from the data how many events should be in each category. Otherwise, it wouldn’t be clear how “high” each component has to be.

2. Create a PDF that sums the components for you

Use e.g. a RooAddPdf for each category you want to sum over (again, this requires that they share the same observable).

RooAddPdf(..., ..., RooArgSet(modelCat1, modelCat2, modelCat3, modelCat4, ...),
                    RooArgSet(n_cat1, n_cat2, ...));

This PDF will share all the parameters etc. with the Simultaneous PDF that you use for fitting, so if you fit the simultaneous to data, this one will automagically follow each change of parameters. What you need, though, is to find out the proper scale factors n_cat1, ..., . Those you get from the data directly:

RooRealVar n_cat1("n_cat1", "Number of events in first category", data.sumEntries("sample==sample::cat1"));
RooRealVar n_cat2("n_cat2", "Number of events in second category", data.sumEntries("sample==sample::cat2"));
...

Replace sample with the proper name of your category object.
You have to set up those coefficients only once, and they can stay fixed after that, because the amount of data events in each category doesn’t change.

hym · September 23, 2020, 2:50pm

Hello @StephanH and thank you for the well-explained answer and apologies for late revert…!!!

I think ‘observable’ means you are referring to the fitting-observable. Yes! I use the same observable for all categories. There are two observables X and M. All the fitting PDF has the same structure (Sig, Bkg), but the value of M and another external parameter (y) decides which PDF should be used to fit this ‘entry’. To be more clear, all the selections and choices of M, are for X-observable fitting.

I hope I am clear.

Regarding your first proposal I am drawing slice for RooSuperCategory ‘massXtypeCat’ :

// Casting to RooCategory
RooCategory *SupCat = (RooCategory *) massXtypeCat ;
//+++++++++Projections of fullModel (RooSimultaneous)+++++++
fullModel.plotOn(f_resX, Slice(*SupCat,“{SB1;FF}”), ProjWData(*SupCat , *data ),Name(“SB1FF”), LineColor(kRed), LineStyle(kDashed));
fullModel.plotOn(f_resX, Slice(*SupCat,“{SB2;FF}”), ProjWData(*SupCat , *data ),Name(“SB2FF”), LineColor(kBlue), LineStyle(kDashed));
fullModel.plotOn(f_resX, Slice(*SupCat,“{SB3;FF}”), ProjWData(*SupCat , *data ),Name(“SB3FF”), LineColor(kGreen), LineStyle(kDashed));

This should give projections in corresponding category label. But it gives 3-copies of single category in 3-different colors with following messages :

[#1] INFO:Plotting – RooAbsReal::plotOn(fullModel) slice variable massXtypeCat was not projected anyway
[#1] INFO:Plotting – RooSimultaneous::plotOn(fullModel) plot on PsProperDL represents a slice in the index category (massXtypeCat)
[#1] INFO:Plotting – RooAbsReal::plotOn(Model_FF_SB3) slice variable massXtypeCat was not projected anyway
[#1] INFO:Caching – RooAbsCachedPdf::getCache(FBx_FF) creating new cache 0x59af960 with pdf NP_histPdf_CONV_RxHistPdf_FF_CACHE_Obs[PsProperDL] for nset () with code 0 from preexisting content.
[#1] INFO:Caching – RooAbsCachedPdf::getCache(FBx_FF) creating new cache 0x59d8c40 with pdf NP_histPdf_CONV_RxHistPdf_FF_CACHE_Obs[PsProperDL] for nset (PsProperDL) with code 1 from preexisting content.
[#1] INFO:NumericIntegration – RooRealIntegral::init(NP_histPdf_CONV_RxHistPdf_FF_CACHE_Obs[PsProperDL]_Int[PsProperDL]) using numeric integrator RooIntegrator1D to calculate Int(PsProperDL)
[#1] INFO:NumericIntegration – RooRealIntegral::init(RxHistPdf_FF_Int[PsProperDL]) using numeric integrator RooIntegrator1D to calculate Int(PsProperDL)
[#1] INFO:Plotting – RooAbsReal::plotOn(fullModel) slice variable massXtypeCat was not projected anyway
[#1] INFO:Plotting – RooSimultaneous::plotOn(fullModel) plot on PsProperDL represents a slice in the index category (massXtypeCat)
[#1] INFO:Plotting – RooAbsReal::plotOn(Model_FF_SB3) slice variable massXtypeCat was not projected anyway
[#1] INFO:Caching – RooAbsCachedPdf::getCache(FBx_FF) creating new cache 0x597c980 with pdf NP_histPdf_CONV_RxHistPdf_FF_CACHE_Obs[PsProperDL] for nset () with code 0 from preexisting content.
[#1] INFO:Caching – RooAbsCachedPdf::getCache(FBx_FF) creating new cache 0x59d8600 with pdf NP_histPdf_CONV_RxHistPdf_FF_CACHE_Obs[PsProperDL] for nset (PsProperDL) with code 1 from preexisting content.
[#1] INFO:NumericIntegration – RooRealIntegral::init(NP_histPdf_CONV_RxHistPdf_FF_CACHE_Obs[PsProperDL]_Int[PsProperDL]) using numeric integrator RooIntegrator1D to calculate Int(PsProperDL)
[#1] INFO:NumericIntegration – RooRealIntegral::init(RxHistPdf_FF_Int[PsProperDL]) using numeric integrator RooIntegrator1D to calculate Int(PsProperDL)
[#1] INFO:Plotting – RooAbsReal::plotOn(fullModel) slice variable massXtypeCat was not projected anyway
[#1] INFO:Plotting – RooSimultaneous::plotOn(fullModel) plot on PsProperDL represents a slice in the index category (massXtypeCat)
[#1] INFO:Plotting – RooAbsReal::plotOn(Model_FF_SB3) slice variable massXtypeCat was not projected anyway
[#1] INFO:Caching – RooAbsCachedPdf::getCache(FBx_FF) creating new cache 0x5a02670 with pdf NP_histPdf_CONV_RxHistPdf_FF_CACHE_Obs[PsProperDL] for nset () with code 0 from preexisting content.
[#1] INFO:Caching – RooAbsCachedPdf::getCache(FBx_FF) creating new cache 0x59a6460 with pdf NP_histPdf_CONV_RxHistPdf_FF_CACHE_Obs[PsProperDL] for nset (PsProperDL) with code 1 from preexisting content.
[#1] INFO:NumericIntegration – RooRealIntegral::init(NP_histPdf_CONV_RxHistPdf_FF_CACHE_Obs[PsProperDL]_Int[PsProperDL]) using numeric integrator RooIntegrator1D to calculate Int(PsProperDL)
[#1] INFO:NumericIntegration – RooRealIntegral::init(RxHistPdf_FF_Int[PsProperDL]) using numeric integrator RooIntegrator1D to calculate Int(PsProperDL)

StephanH · September 23, 2020, 4:16pm

Hi @hym,

You didn’t ask a question, did you?

I saw one thing, though:

A RooSuperCategory is not a RooCategory, see the class inheritance diagram on the page I linked. The cast you put there in general will not lead to correct results, because it interprets the object in memory as something that it isn’t.
I’m surprised that it doesn’t crash horribly.

I had a look in the code, and I think that you should slice using the fundamental categories, and not the SuperCategory as follows:

plotOn(..., Slice(SBCat, "SB3"), Slice(FCat, "FF")); // adjust the category names as needed.

The plotting routine looks like it compiles the correct SBCat x FCat state from the things you pass.

StephanH · September 23, 2020, 4:34pm

It’s actually documented here, but wasn’t easy to find …
https://root.cern.ch/doc/master/classRooAbsReal.html#a32073d896314ad08535328f556379780

hym · September 23, 2020, 6:43pm

Thanks, @StephanH for the prompt reply and the useful link!
Yes, that was my (hidden) question that you have already answered.
I tried your suggestion for individual slices on categories and it is working…

Still have a doubt!
Is RooSimultaneous fitting my whole data corresponding to the different pdfs (as I have 10 pdfs for 10 categories) in different categories. I feel it is doing it because in the end I get only one value for my free parameters (fB, fSig).

Otherwise, I might have got 10 different values of these parameters (one for each category).

I would be happy if you comment on it.

StephanH · September 24, 2020, 6:57am

If course it’s fitting the whole dataset with all PDFs!

The whole reason that the RooSimultaneous exists is that it measures a few parameters in N regions simultaneously! If you wanted one result per region, you could just take the data of region A, fit it, record the result. Take the data of region B, fit it, and record the result, etc.

hym · September 24, 2020, 4:53pm

Thanks @StephanH for clarification!

While I had already asked about this problem, but this is still unsolved, So again I am putting here…

The problem is overlapping components in plotting…

Here is snippet :

// Making the components

RooFFTConvPdf *FBx_FF = new RooFFTConvPdf(“FBx_FF”, “FBx_FF”, *X, *NP_histPdf,*RxHistPdf_FF);

RooAddPdf Fsig_FF(“Fsig_FF”,“Fsig_FF”,RooArgSet(*FBx_FF,*RxHistPdf_FF),RooArgList(fB));
RooProdPdf Signal_FF(“Signal_FF”,“Signal_FF”,RooArgList(Msig,Fsig_FF));
RooProdPdf Bkg_FF_SB1(“Bkg_FS_SB1”,“Bkg_FS_SB1”,RooArgList(Mbkg,*fBkgFF[0]));

// Implmenting Full Model with components
Model_FF_SB1 = new RooAddPdf(“Model_FF_SB1”, “Model_FF_SB1”, RooArgSet(Signal_FF, Bkg_FF_SB1), RooArgList(fsig));

    // setting RooSimultaneous and Fitting on data

    ....

    // Plotting (here is the problem)
    fullModel.plotOn(fResX[i], Slice(*mCat,mRange), Slice(*sCat,"FF") ,ProjWData(RooArgSet(*mCat,*sCat), *data ), Name(plotName), LineColor(kRed), LineStyle(kDashed));
	fullModel.plotOn(fResX[i], Slice(*mCat,mRange), Slice(*sCat,"FF") ,ProjWData(RooArgSet(*mCat,*sCat), *data ), Components(RooArgSet(*RxHistPdf_FF)), Name(plotName+"Rx"), LineColor(kBlue), LineStyle(kDashed));
	fullModel.plotOn(fResX[i], Slice(*mCat,mRange), Slice(*sCat,"FF") ,ProjWData(RooArgSet(*mCat,*sCat), *data ), Components(RooArgSet(*FBx_FF)), Name(plotName+"FBx"), LineColor(kViolet), LineStyle(kDashed));
	fullModel.plotOn(fResX[i], Slice(*mCat,mRange), Slice(*sCat,"FF") ,ProjWData(RooArgSet(*mCat,*sCat), *data ), Components(RooArgSet(Fsig_FF)), Name(plotName+"FSig"), LineColor(kBlack), LineStyle(kDashed));

As you see I am plotting three components here. They are plotted well but they are overlapped. I see three colored-dashed lines (as specified) but overlapped and it looks like I plotted the same component three times.

I don’t what is the reason here. Please share if you have any idea.

StephanH · September 25, 2020, 11:57am

It could be that a component selection while also slicing a specific category doesn’t work. Can you post example code or send me a ROOT file with those components?

Have you tried if selecting the components by name works?

Do you see a message similar to

RooAbsPdf::plotOn( <PDF name> ) directly selected PDF components: <components>

?

hym · September 30, 2020, 1:08am

Hello @StephanH,
Yes I see this message which you mentioned. It would be really good if you look inside the code itself. Please download the zip file from cernbox -

Open fB_extr.C macro

Goto Line # 500 where I am implementing the model to fit. You will see.

After Model implementation, you will see the plotting area and here you will see the description of problems I am facing and the ideal behavior of the program (how it should be in ideal case).

You can run it directly just to see its behavior.

https://cernbox.cern.ch/index.php/s/ecGxS4Y9xldBgrm

Thank you!

hym · September 30, 2020, 5:25pm

Hi @StephanH…
This macro is projecting the same PDF on all of the frames, that’s why it is giving very bad chiSquare.
I just saw it.
You might realize the same thing when you see the output frames.

Thank you for your concern!

hym · October 2, 2020, 1:52am

Hi @StephanH, (@Wouter_Verkerke)
I did not manage to get the desired results from my side in this issue.
So I would be really grateful if you share any idea or solution from your side.

Thanks Again and Again!!

system · October 16, 2020, 1:52am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.