Problems with 2D Simultaneous fit

sanjeeda · January 3, 2022, 6:57am

Dear all,
I am trying to perform the 2-dimensional simultaneous fits.The macro that I am using is: simultaneous_KK.C. and it is available in simfit_forum.tar.gz (2.5 MB).

The problem is the fit runs for more than half an hour,(obviously does not converge) and I am getting the following messages repeatedly:

Would you please have a look and let me know what is wrong in it ?

Regards,
Sanjeeda

etejedor · January 3, 2022, 4:32pm

Hello,

@moneta or @jonas should be able to answer soon, since the end of the Christmas break is close!

jonas · January 4, 2022, 3:05pm

Hi @sanjeeda!

There are 3 problems I see in your scripts. Two I can help you fix immediately, and one I can only give general guidance for because I don’t know exactly how you expect your model to be.

1st problem: you are ignoring the RooFit and Minuit warnings. If I run your script, I see some warnings about the sigma parameters for the Gaussians where you forgot to limit them in a positive range, and more warnings about limits that are too far apart for the yield variables. Fixing these usually makes the fit more stable. You should define a meaningful range for the sigma_ parameters, e.g.:

RooRealVar sigma_G1_sig("#sigma_{G1sig}", "sigma_gauss1",0.00385, 1e-5, 0.1);
// and not only:
// RooRealVar sigma_G1_sig("#sigma_{G1sig}", "sigma_gauss1",0.00385);

And for the yield parameters, I would narrow down the allowed order of magnitudes, e.g.:

RooRealVar sig_peak_yield("N_{sig_peak}","signal yield",3.18353e2,1e1,1e7);
// instead of
// RooRealVar sig_peak_yield("N_{sig_peak}","signal yield",3.18353e2,0,1e9);
// (note that allowing a zero yield is not a good idea anyway)

2nd problem: The RooFit variables in your combined dataset don’t match with the variables in the D0 and D0bar datasets.

At the beginning of the script, you defined six RooRealVar: mD0pi, mKK, mD0pi_d0, mKK_d0, mD0pi_d0bar, and mKK_d0bar. The variables with the _d0 and _d0bar suffixes are used to import the distributions_d0 and distributions_d0bar TTrees, which makes sense because if the variable names in the RooDataSet don’t match with the TTree branch names the branches don’t get imported.

But then you try to create a combined dataset with the variables without suffix:

RooDataSet combData("combData","combined data",RooArgSet(mKK, mD0pi),
                    Index(sample),
                    Import("D0",*dataxy_d0),Import("D0bar",*dataxy_d0bar)) ;

This does not work! The Import only works if the variable names match, otherwise the combined dataset will be just filled with the values that mKK and mD0pi are currently set to. If you comment out the line with simPdf.fitTo(combData) to skip the broken fit, you can indeed see that the data was not correctly imported.

You can work around this by replacing the RooDataSet constructor call before with some custom code that manually fills the dataset from the two existing ones even if the variable names don’t match:

  RooArgSet dataVars{sample, mKK, mD0pi};
  RooDataSet combData("combData","combined data",dataVars);

  auto copyVal = [](RooAbsArg& to, RooAbsArg const& from){
      static_cast<RooRealVar&>(to).setVal(
          static_cast<RooAbsReal const&>(from).getVal());
  };

  sample.setLabel("D0");
  for(std::size_t i = 0; i < dataxy_d0->numEntries(); ++i) {
    RooArgSet const& vars = *dataxy_d0->get(i);
    copyVal(dataVars["mkk"], vars["mkk_d0"]);
    copyVal(dataVars["md0pi"], vars["md0pi_d0"]);
    combData.add(dataVars);
  }

  sample.setLabel("D0bar");
  for(std::size_t i = 0; i < dataxy_d0bar->numEntries(); ++i) {
    RooArgSet const& vars = *dataxy_d0bar->get(i);
    copyVal(dataVars["mkk"], vars["mkk_d0bar"]);
    copyVal(dataVars["md0pi"], vars["md0pi_d0bar"]);
    combData.add(dataVars);
  }

I would also advise to put this code in a separate function to avoid spaghetti code. If you keep the fitTo call commented out and run your script again, you will see that the dataset is now filled correctly.

3rd problem (related to second one): you are using the wrong observable RooRealVars in your model.

As we discussed before, the only observables in the dataset are now mD0pi, mKK, plus a third variable sample that determines whether a given row is from a D0 or D0bar. This is the correct way to set up your dataset for the RooSimultaneous, but that also means that you can’t use the variables suffixed with _d0 or _d0bar in the model at all! They were only intermediate variables to import your trees separately! But you are using them in your model, and that’s where the evaluation errors come from.

So the 3rd change you need to make is to change your models to use only the mD0pi and mKK observables. However, I can’t tell you exactly how to do that, because from your script it’s not clear which parameters of the model should be shared between the D0 and D0bar components.

I see your final RooAddPdf models here:

  RooAddPdf model_d0("model_d0","",
          RooArgList(signal_peak,signal_rnd,
              mult_peak,mult_rnd,kpi_peak, kpi_rnd,ds_peak, comb_bkg),
          RooArgList(sig_peak_yield_d0, sig_randompi_yield_d0,
              mult_peak_yield_d0, mult_randompi_yield_d0,
              kpi_peak_yield_d0, kpi_randompi_yield_d0,d s_peak_yield_d0,
              comb_yield_d0));
  
  RooAddPdf model_d0bar("model_d0bar","",
          RooArgList(signal_peak,signal_rnd,
              mult_peak,mult_rnd,kpi_peak, kpi_rnd,ds_peak, comb_bkg),
          RooArgList(sig_peak_yield_d0bar, sig_randompi_yield_d0bar,
              mult_peak_yield_d0bar, mult_randompi_yield_d0bar,
              kpi_peak_yield_d0bar, kpi_randompi_yield_d0bar,
              ds_peak_yield_d0bar, comb_yield_d0bar));

I see that you have independent yield parameters for d0 and d0bar, which makes complete sense. But the RooAddPdf pdf components are exactly the same! Okay why not, but then I see comments in the pdf definitions like random pion (common for all), and sometimes you use the suffixed observables and sometimes not. That’s a bit fishy to me. Which pdfs/parameters should actually be shared then?

To conclude: I advise you to implement my suggestions to fix problem 1 and 2 first. Then you rewrite your model to not use the suffixed observables, carefully thinking which parameters and pdfs need to be shared between the D0 and D0bar components. Don’t hesitate to follow up here if you need help with this, but if you do please tell me what you really want to achieve mathematically as this is not clear to me.

Good luck!
Jonas

sanjeeda · January 7, 2022, 7:15am

Dear @jonas,

Thank you for the detailed explanation.

For problem 1: There are no ranges to sigma because I want them to be fixed and I have also made changes to the ranges.

For Problem 2: I have tried to fix it in a different way. In the input root file that I am using. The different trees are still there but the names of the variables are same in all the 3 trees are the same. I think this fixes the problem.

For Problem 3: But, I want to fit for the total yield with exactly the same PDF for D0 and D0bar samples.

I am sharing here the upated folderNew_simultaneous_fit.tar.gz (2.6 MB)
in which I am working. I think the fit is working correctly now. The fit converges when all Araw parameters are fixed to 0 but the pulls are not ok.
Eventually, I need to free all the Araw parameters and I also need to have the pulls to be ok.

Please let me know what you think. Also, I wish you a great year ahead .

Regards,
Sanjeeda

jonas · January 13, 2022, 12:04am

Hi @sanjeeda,

thanks for following up, it looks good to me now!

By the way, thank you very much for posting you question in the forum, your code example has unveiled a bug in the recent ROOT version that is about to get fixed

github.com/root-project/root

[RF] Reset cached normalization sets if servers are redirected

root-project:master ← guitargeek:normSet_1

opened 11:21PM - 12 Jan 22 UTC

guitargeek

+23 -11

If a server is redirected, the cached normalization sets in `RooAbsPdf` and `Ro…oAddPdf` might not point to the right observables anymore. We need to reset them. This bug was discovered thanks to a [forum post](https://root-forum.cern.ch/t/problems-with-2d-simultaneous-fit/48249/4) that provided a code snippet that crashed in ROOT master and 6.24 because the cached normalization sets were used after the servers were redirected. Needs to be backported to 6.24.

sanjeeda · January 19, 2022, 1:02pm

Hi @jonas ,

I have actually changed how I am separating the sample into D0 and D0bar components. Now I am doing it in the fitting code itself and I think this is a better way. This is how it looks like now (i am showing only one variable (D0pi)):

There are still things which i don’t understand:

First: The fit converges when Asymmetry (A) for all components are fixed to 0. However, the fit doesn’t converge when the parameters are free. Irrespective of any change that is made, the Minuit output is always the same.

Second: I am not sure what is wrong with the pulls although the fit and its logplots look fine. I am showing below a patch that plots the fit and draws the pulls:

RooPlot *kk_d0 = mKK.frame(Title(“D0”));
combData.plotOn(kk_d0,Cut(“sample==sample::D0”));
kk_d0->SetYTitle(“Candidates per 3 MeV/#it{c}^{2}”);
kk_d0->GetYaxis()->SetTitleOffset(1.5);
simPdf.plotOn(kk_d0,Slice(sample,“D0”),ProjWData(sample,combData));
simPdf.plotOn(kk_d0,Slice(sample,“D0”),Components(RooArgSet(signal_peak,signal_rnd)),ProjWData(sample,combData),LineStyle(kDashed),LineColor(kRed)) ;
simPdf.plotOn(kk_d0,Slice(sample,“D0”),Components(RooArgSet(kpi_peak,kpi_rnd)),ProjWData(sample,combData),LineStyle(kDashed),LineColor(kGreen));
simPdf.plotOn(kk_d0,Slice(sample,“D0”),Components(RooArgSet(mult_peak,mult_rnd)),ProjWData(sample,combData),LineStyle(kDashed),LineColor(kBlack));
simPdf.plotOn(kk_d0,Slice(sample,“D0”),Components(RooArgSet(ds_peak)),ProjWData(sample,combData),LineStyle(kDashed),LineColor(kCyan));
simPdf.plotOn(kk_d0,Slice(sample,“D0”),Components(comb_bkg),ProjWData(sample,combData),LineStyle(kDashed),LineColor(kMagenta));

RooHist* hpullx_d0_mkk = kk_d0->pullHist();
for(int i=0;i<hpullx_d0_mkk->GetN();++i) hpullx_d0_mkk->SetPointError(i,0.0,0.0,0.0,0.0);
RooPlot* pullplotx_d0_mkk = mKK.frame(Title(" ")) ;
pullplotx_d0_mkk->addPlotable(hpullx_d0_mkk,“B”);

TCanvas* c1 = new TCanvas(“c1”, “c1”,700, 700);
TPad *pad1 = new TPad(“pad1”, “pad1”, 0, 0.3, 1, 1.0);
pad1->Draw();
pad1->cd();
kk_d0->Draw();
c1->cd();
TPad *pad2 = new TPad(“pad2”, “pad2”, 0, 0.05, 1, 0.3);
pad2->Draw();
pad2->cd();
pullplotx_d0_mkk->Draw();

Third: I am not sure what the number sample=0.0…and sample = 12… means. This also does not show in the Minuit output.

Fourth: I also cannot see the pulls if I use Root 6 to compile the macro.

Here’s the directory that I am working in: simFit_KK.tar.gz (2.6 MB)
and the macro is simFit_KK.C. Will you please have a look?

Regards,
Sanjeeda

sanjeeda · January 26, 2022, 4:41am

Hi,

I have managed to solve all the problems listed in the above post. However, the Third problem still remains. This is what the plots look like now but i still don’t know what is the 5s=0.0000000…-s parameter in the statistical box and this also keeps changing every time I run the fit.

It is not a parameter in the fit and also does not show up in the Minuit output. This happens for the plots of both, D0 and D0bar samples.

Here’s the new working directory simfit_working.tar.gz (2.5 MB), and the macro is simFit_KK.C.

Please have a look.

system · February 9, 2022, 4:41am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.