RooMCStudy: range of sampled initial parameters from constraint

Antonio83 · September 22, 2020, 9:40am

Dear all,

I am implementing a toy MC study for a model with two signals and a background:

Nexp = f1 * sgn1 + f2 * sgn2 + (1 - f1 -f2)*bkg

where the background is affected by a shape systematic through a nuisance parameter alpha.

I have constraints on both alpha and on the signal fractions: the constraint on alpha is a simple gaussian and the one on the fractions is a two dimensional gaussian with correlation (this is implemented through a RooMultiVarGausssian). To perform the toy MC experiments through RooMCStudy, and taking into account the constraints, I am following the prescriptions of this tutorial:

https://root.cern/doc/master/rf804__mcstudy__constr_8C.html

The problem I have is appearing when some of the toys are produced with “unphysical” values of the fractions (i.e. f1 + f2 > 1): here is a printout of one of these toys

  1) RooRealVar::     alpha = 5 +/- 0.00609041
  2) RooRealVar::        f1 = 1 +/- 0.000510215
  3) RooRealVar::        f2 = 0.763207 +/- 0.00668681
  4) RooRealVar::       NLL = 6975.17
  5) RooRealVar::      ngen = 2000
  6) RooRealVar:: alpha_gen = 1.53497 +/- 0.0663869
  7) RooRealVar::    f1_gen = 0.79024 +/- 0.0061129
  8) RooRealVar::    f2_gen = 0.468616 +/- 0.00518298
  9) RooRealVar::  alphaerr = 0.00609041
 10) RooRealVar:: alphapull = 568.932
 11) RooRealVar::     f1err = 0.000510215
 12) RooRealVar::    f1pull = 411.121
 13) RooRealVar::     f2err = 0.00668681
 14) RooRealVar::    f2pull = 44.0556

Here are the distributions I get on the parameters:

The question I have is: is there a way to configure RooMCStudy in order to reject these unphysical toys? Should I hadd some more constraints on the model?

The only solution I can think off is to write custom code deriving from RooAbsMCStudyModule to reject the unphysical values for the fractions.

Here is the code I am using:

void test_toy()
{
	// signal 1 
	TH1F* hsgn1 = new TH1F("hsgn1","signal 1", 20, 0, 100);
	TF1* g = new TF1("g","[0]*exp(-(x-[1])^2/(2*[2]^2))", 0, 100);
	g -> SetParameters(100, 50, 10);
	hsgn1 -> FillRandom("g", 100000);

	// signal 2 
	TH1F* hsgn2 = new TH1F("hsgn2","signal 2", 20, 0, 100);
	g -> SetParameters(100, 70, 10);
	hsgn2 -> FillRandom("g", 100000);

	// background
	TH1F* hbkg_nom = new TH1F("hbkg_nom","nominal bkg", 20, 0, 100);
	TF1* fe = new TF1("fe","[0]*exp(-[1]*x)", 0, 100);
	fe -> SetParameters(50, 0.02);
	hbkg_nom -> FillRandom("fe", 100000);

	TH1F* hbkg_p = new TH1F("hbkg_p","+1 sigma bkg", 20, 0, 100);;
	fe -> SetParameters(55, 0.03);
	hbkg_p -> FillRandom("fe", 150000);

	TH1F* hbkg_m = new TH1F("hbkg_m","-1 sigma bkg", 20, 0, 100);;
	fe -> SetParameters(45, 0.01);
	hbkg_m -> FillRandom("fe", 500000);

	// define observables
	RooRealVar e{"e", "energy", 0, 100, "MeV"};
	e.setBins(Int_t(20));

	// signal pdf
	hsgn1 -> Scale(1./hsgn1 -> Integral());  
	RooDataHist dh_sgn1{"dh_sgn1", "dh_sgn1", RooArgSet{e}, hsgn1};
	RooHistPdf sgn1("sgn1", "sgn1", RooArgSet{e}, dh_sgn1);

	hsgn2 -> Scale(1./hsgn2 -> Integral()); 
	RooDataHist dh_sgn2{"dh_sgn2", "dh_sgn2", RooArgSet{e}, hsgn2};
	RooHistPdf sgn2("sgn2", "sgn2", RooArgSet{e}, dh_sgn2);

	// background pdf
	//
	// nominal value
	hbkg_nom -> Scale(1./hbkg_nom -> Integral()); 
	RooDataHist dh_bkg_nom{"dh_bkg_nom", "dh_bkg_nom", RooArgSet{e}, hbkg_nom};
	RooHistPdf bkg_0("bkg_0", "bkg_0", RooArgSet{e}, dh_bkg_nom);
	//
	// bkg +1 sigma
	hbkg_p -> Scale(1./hbkg_p -> Integral());
	RooDataHist dh_bkg_p{"dh_bkg_p", "dh_bkg_p", RooArgSet{e}, hbkg_p};
	RooHistPdf bkg_p("bkg_p", "bkg_p", RooArgSet{e}, dh_bkg_p);
	//RooHistFunc bkg_p("bkg_p", "bkg_p", RooArgSet{e}, dh_bkg_p);
	//
	// bkg -1 sigma
	hbkg_m -> Scale(1./hbkg_m -> Integral()); 
	RooDataHist dh_bkg_m{"dh_bkg_m", "dh_bkg_m", RooArgSet{e}, hbkg_m};
	RooHistPdf bkg_m("bkg_m", "bkg_m", RooArgSet{e}, dh_bkg_m);
	//
	// build interpolation between -1 -> +1 sigma
	RooRealVar alpha{"alpha", "alpha bkg sys", 0, -5, 5};
	PiecewiseInterpolation bkg_sys("bkg_sys", "bkg_sys", bkg_0, bkg_m, bkg_p, alpha);

	// build model
	double f1_val = 0.4;
	double f2_val = 0.2;
	RooRealVar f1{"f1", "f1", f1_val, 0, 1};
	RooRealVar f2{"f2", "f2", f2_val, 0, 1};

	RooRealSumPdf sb{"sb", "sb", RooArgSet(sgn1, sgn2, bkg_sys), RooArgList(f1, f2)};

	// introduce alpha constraint
	RooGaussian gaus_alpha("gaus_alpha", "gaus_alpha", alpha, RooFit::RooConst(0), RooFit::RooConst(1));

	// introduce f1, f2 constraint

	TMatrixDSym Vf(2);
	double sigma_f = 0.04;
	Vf(0,0) = sigma_f;
	Vf(1,1) = sigma_f/2.;
	Vf(0,1) = -0.02*sigma_f*sigma_f/2.;

	RooMultiVarGaussian mgaus_f("c_Vf", "c_Vf", RooArgList(f1, f2), RooArgList(RooFit::RooConst(f1_val), RooFit::RooConst(f2_val)), Vf);

	RooProdPdf model{"model", "model", RooArgList(sb, gaus_alpha, mgaus_f)};

	// Example of single experiment
	//
	alpha.setVal(2);
	RooDataHist* data = model.generateBinned(RooArgSet{e}, 15000, RooFit::Verbose(1));

	TCanvas* c = new TCanvas();
	c->cd();

	auto frame = e.frame(RooFit::Title("Before fit"));

	data->plotOn(frame);

	model.plotOn(frame, RooFit::LineColor(6));
	model.plotOn(frame, RooFit::LineColor(2), RooFit::Components("sgn1"));
	model.plotOn(frame, RooFit::LineColor(44), RooFit::Components("sgn2"));
	model.plotOn(frame, RooFit::LineColor(4), RooFit::Components("bkg_sys"));

	frame -> Draw();

	// fit
	f1.setVal(f1_val+0.05);
	f2.setVal(f2_val+0.05);
	alpha.setVal(0.0);
	//alpha.setConstant(kTRUE);

	auto fitres = model.fitTo(*data, RooFit::Constrain(RooArgSet(alpha, f1, f2)), RooFit::Save());

	fitres->Print();

	TCanvas* c1 = new TCanvas();
	c1->cd();

	auto frame1 = e.frame(RooFit::Title("After fit"));

	data->plotOn(frame1);

	model.plotOn(frame1, RooFit::LineColor(6));
	model.plotOn(frame1, RooFit::LineColor(2), RooFit::Components("sgn1"));
	model.plotOn(frame1, RooFit::LineColor(44), RooFit::Components("sgn2"));
	model.plotOn(frame1, RooFit::LineColor(4), RooFit::Components("bkg_sys"));

	frame1 -> Draw();

	// Toy MC (from tutorial https://root.cern/doc/master/rf804__mcstudy__constr_8C.html)
	//
	// Perform toy study with internal constraint on alpha, f1 and f2
	f1.setVal(f1_val);
	f2.setVal(f2_val);

	RooMCStudy mcs(model, e, RooFit::Constrain(RooArgSet(alpha, f1, f2)), RooFit::Silence(), RooFit::Binned(), RooFit::FitOptions(RooFit::PrintLevel(-1)));

	// Run 500 toys of 2000 events.
	// Before each toy is generated, a value for alpha, f1 and f2 is sampled from the constraint pdfs and
	// that value is used for the generation of that toy.
	mcs.generateAndFit(500, 2000);

	// Make plot of distribution of generated value of alpha parameter
	TH1 *h_alpha_gen = mcs.fitParDataSet().createHistogram("alpha_gen", -40);

	// Make plot of distribution of fitted value of alpha parameter
	RooPlot *frame2 = mcs.plotParam(alpha, RooFit::Bins(40));
	frame2 -> SetTitle("Distribution of fitted #alpha values");

	// Make plot of pull distribution on alpha
	RooPlot *frame3 = mcs.plotPull(alpha, RooFit::Bins(40), RooFit::FitGauss());
	frame3 -> SetTitle("Distribution of #alpha pull values");

	TCanvas *c2 = new TCanvas("mcstudy_alpha_constr", "mcstudy_alpha_constr", 1200, 400);
	c2 -> Divide(3);

	c2 -> cd(1);
	gPad -> SetLeftMargin(0.15);
	h_alpha_gen -> GetYaxis() -> SetTitleOffset(1.4);
	h_alpha_gen -> Draw();

	c2 -> cd(2);
	gPad -> SetLeftMargin(0.15);
	frame2 -> GetYaxis() -> SetTitleOffset(1.4);
	frame2 -> Draw();

	c2 -> cd(3);
	gPad -> SetLeftMargin(0.15);
	frame3 -> GetYaxis() -> SetTitleOffset(1.4);
	frame3 -> Draw();

	// Make plot of distribution of generated value of f1 parameter
	TH1 *h_f1_gen = mcs.fitParDataSet().createHistogram("f1_gen", -40);




	double f1pull(0), f2pull(0);
	for(auto t=0; t<500; t++)
	{
		cout << "\nN TOY = " << t << "\n" << endl;

		f1pull = mcs.fitParDataSet().get(t)->getRealValue("f1pull");
		f2pull = mcs.fitParDataSet().get(t)->getRealValue("f2pull");

		if( f1pull > 50 || f2pull > 50 )
			cout << "\nPROBLEM\n" << endl;

		mcs.fitParDataSet().get(t)->Print("V");
	}

	// Make plot of distribution of fitted value of f1 parameter
	RooPlot *frame4 = mcs.plotParam(f1, RooFit::Bins(40));
	frame4 -> SetTitle("Distribution of fitted f1 values");

	// Make plot of pull distribution on f1
	RooPlot *frame5 = mcs.plotPull(f1, RooFit::Bins(40), RooFit::FitGauss());
	frame5 -> SetTitle("Distribution of f1 pull values");

	TCanvas *c3 = new TCanvas("mcstudy_f1_constr", "mcstudy_f1_constr", 1200, 400);
	c3 -> Divide(3);

	c3 -> cd(1);
	gPad -> SetLeftMargin(0.15);
	h_f1_gen -> GetYaxis() -> SetTitleOffset(1.4);
	h_f1_gen -> Draw();

	c3 -> cd(2);
	gPad -> SetLeftMargin(0.15);
	frame4 -> GetYaxis() -> SetTitleOffset(1.4);
	frame4 -> Draw();

	c3 -> cd(3);
	gPad -> SetLeftMargin(0.15);
	frame5 -> GetYaxis() -> SetTitleOffset(1.4);
	frame5 -> Draw();

	// Make plot of distribution of generated value of f2 parameter
	TH1 *h_f2_gen = mcs.fitParDataSet().createHistogram("f2_gen", -40);

	// Make plot of distribution of fitted value of f2 parameter
	RooPlot *frame6 = mcs.plotParam(f2, RooFit::Bins(40));
	frame6 -> SetTitle("Distribution of fitted f2 values");

	// Make plot of pull distribution on f2
	RooPlot *frame7 = mcs.plotPull(f2, RooFit::Bins(40), RooFit::FitGauss());
	frame7 -> SetTitle("Distribution of f2 pull values");

	TCanvas *c4 = new TCanvas("mcstudy_f2_constr", "mcstudy_f2_constr", 1200, 400);
	c4 -> Divide(3);

	c4 -> cd(1);
	gPad -> SetLeftMargin(0.15);
	h_f2_gen -> GetYaxis() -> SetTitleOffset(1.4);
	h_f2_gen -> Draw();

	c4 -> cd(2);
	gPad -> SetLeftMargin(0.15);
	frame6 -> GetYaxis() -> SetTitleOffset(1.4);
	frame6 -> Draw();

	c4 -> cd(3);
	gPad -> SetLeftMargin(0.15);
	frame7 -> GetYaxis() -> SetTitleOffset(1.4);
	frame7 -> Draw();

}

Thanks for your help.

StephanH · September 22, 2020, 9:53am

Try to use ExternalConstraints from here.

The constraint could be something like a Gaussian(1 | a+b+c+d, 0.01), that is, you want the sum of a b c d to be equal to 1 within an uncertainty of 0.01. I guess that could stop the MCStudy from generating such crazy values.

See also
https://root.cern/doc/master/rf604__constraints_8C.html
for how to use an external constraint.

And the sum might be implemented as

RooFormulaVar sumCoeff("sumCoeff", "a + b + (1-a-b)", RooArgList(a, b));
RooGaussian constraint("constraint", "", RooConst(1), sumCoeff, RooConst(0.01));

StephanH · September 22, 2020, 10:02am

The model might try to find some loopholes by e.g. making a coefficient negative, but you can prevent this by forcing the coefficients into the range [0, 1]. (I see you did that already. )

Antonio83 · September 22, 2020, 10:55am

Hi @StephanH,

thanks for the suggestion. It seems the right path to folow, but is not working because the condition sumCoeff = 1 is always met.

Is there a way to implement some conditional constraint? Something like:

RooFormulaVar sumCoeff("sumCoeff", "1 - a - b", RooArgList(a, b));
RooStep constraint("constraint", "", RooArgSet(sumCoeff));

a “step function pdf” might work here.

StephanH · September 22, 2020, 12:06pm

You mean you want a + b always in the interval [0, 1]?

RooFormulaVar constr("constr", "(a+b > 0) && (a+b < 1)");

This is a step function, but it’s not nice to have steps. You might have to tweak it such that it e.g. starts rising shortly before it hits the boundary or similar.

Antonio83 · September 22, 2020, 1:10pm

It seems that the following works:

RooGenericPdf gpdf{"gpdf", "f1 + f2 < 1", RooArgSet(f1, f2)};

RooMCStudy mcs(model, e, RooFit::Constrain(RooArgSet(alpha, f1, f2)), RooFit::ExternalConstraints(gpdf), RooFit::FitOptions(RooFit::Save()), RooFit::Silence(), RooFit::Binned(), RooFit::FitOptions(RooFit::PrintLevel(-1)));

in the sense that it avoids unphysical values for the fractions, but the fit always returns the initial parameter values: this is an example for one of the toys

N TOY = 0

  1) RooRealVar::     alpha = -0.488531 +/- 3.87172
  2) RooRealVar::        f1 = 0.4 +/- 0.412235
  3) RooRealVar::        f2 = 0.2 +/- 0.336588
  4) RooRealVar::       NLL = -1e+30
  5) RooRealVar::      ngen = 2000
  6) RooRealVar:: alpha_gen = 0.457955 +/- 0.0663869
  7) RooRealVar::    f1_gen = 0.0139029 +/- 0.0061129
  8) RooRealVar::    f2_gen = 0.0553897 +/- 0.00518298
  9) RooRealVar::  alphaerr = 3.87172
 10) RooRealVar:: alphapull = -0.244462
 11) RooRealVar::     f1err = 0.412235
 12) RooRealVar::    f1pull = 0.936595
 13) RooRealVar::     f2err = 0.336588
 14) RooRealVar::    f2pull = 0.429636

  RooFitResult: minimized FCN value: -1e+30, estimated distance to minimum: 0
                covariance matrix quality: Approximation only, not accurate
                Status : MINIMIZE=0 HESSE=0 

    Floating Parameter  InitialValue    FinalValue +/-  Error     GblCorr.
  --------------------  ------------  --------------------------  --------
                 alpha    1.9569e+00   -4.8853e-01 +/-  3.87e+00  <none>
                    f1    4.0000e-01    4.0000e-01 +/-  4.12e-01  <none>
                    f2    2.0000e-01    2.0000e-01 +/-  3.37e-01  <none>

seems like the initial fraction values are fixed.

StephanH · September 22, 2020, 3:35pm

There is a problem with the model. The minimised function value should not be -1.E30. Try to print the entire model using something like model.Print("T"), and inspect all terms.
It could be that the model only fails when the constraints are active. If that’s true, make a productPdf of the model and the external constraints (or just add the ext. constraint to the existing product), and print that as well.

Antonio83 · September 22, 2020, 4:09pm

Printing the model with only the gaussian constraints on the parameters:

RooProdPdf model{"model", "model", RooArgList(sb, gaus_alpha, mgaus_f)};

I don’t recognize anything weird (see Print1).

If I add the extra external constraint:

RooGenericPdf gpdf{"gpdf", "f1 + f2 < 1", RooArgSet(f1, f2)};
RooProdPdf model1{"model1", "model1", RooArgList(sb, gaus_alpha, mgaus_f, gpdf)};

also doesn’t seem to show something weird (see Print2).

I tried to perform a single fit to generated data by including the external constrain in the model (basically using model1). The full log is in attachement, but the first warning I get when the mnimization starts is:

FIRST CALL TO USER FUNCTION AT NEW START POINT, WITH IFLAG=4.
[#1] INFO:NumericIntegration – RooRealIntegral::init(gpdf_Int[f1,f2]) using numeric integrator RooAdaptiveIntegratorND to calculate Int(f1,f2)
[#0] WARNING:Minization – RooMinimizerFcn: Minimized function has error status.
Returning maximum FCN so far (-1e+30) to force MIGRAD to back out of this region. Error log follows
Parameter values: alpha=0, f1=0.45, f2=0.25
RooGenericPdf::gpdf[ actualVars=(f1,f2) formula=“x[0] + x[1] < 1” ]
p.d.f normalization integral is zero or negative: 0 @ actualVars=(f1 = 0.45,f2 = 0.25)
getLogVal() top-level p.d.f evaluates to zero @ actualVars=(f1 = 0.45,f2 = 0.25)

so it seems a normalization problem, and in fact defining a pdf with a conditional function might not be a good idea . But then I don’t know how to throw away unphysical values.

Print1

root [67] model.Print("T")
0x7f860745f000 RooProdPdf::model = 0.0204808 [Auto,Dirty] 
RooProdPdf begin partial integral cache
[0] 0x7f86075217d0 RooRealSumPdf::sb = 0.0204808 [Auto,Dirty] 
[0]   0x7f860f0c3730/V- RooHistPdf::sgn1 = 0.038578 [Auto,Dirty] 
[0]     0x7f860f0c3040/V- RooRealVar::e = 50
[0]   0x7f8607521000/V- RooRealVar::f1 = 0.4
[0]   0x7f8607540000/V- RooHistPdf::sgn2 = 0.008908 [Auto,Dirty] 
[0]     0x7f860f0c3040/V- RooRealVar::e = 50
[0]   0x7f86075213e8/V- RooRealVar::f2 = 0.2
[0]   0x7f860752c580/V- PiecewiseInterpolation::bkg_sys = 0.00817 [Auto,Clean] 
[0]     0x7f8607540888/V- RooHistPdf::bkg_0 = 0.00817 [Auto,Dirty] 
[0]       0x7f860f0c3040/V- RooRealVar::e = 50
[0]     0x7f860752c000/V- RooHistPdf::bkg_m = 0.0093732 [Auto,Dirty] 
[0]       0x7f860f0c3040/V- RooRealVar::e = 50
[0]     0x7f8607535308/V- RooHistPdf::bkg_p = 0.006664 [Auto,Dirty] 
[0]       0x7f860f0c3040/V- RooRealVar::e = 50
[0]     0x7f8607535b90/V- RooRealVar::alpha = 0
[0] 0x7f860751a000 RooGaussian::gaus_alpha = 1 [Auto,Dirty] 
[0]   0x7f8607535b90/V- RooRealVar::alpha = 0
[0]   0x41143f0/V- RooConstVar::0 = 0
[0]   0x40a8930/V- RooConstVar::1 = 1
[0] 0x7f860751a528 RooMultiVarGaussian::c_Vf = 1 [Auto,Dirty] 
[0]   0x7f8607521000/V- RooRealVar::f1 = 0.4
[0]   0x7f86075213e8/V- RooRealVar::f2 = 0.2
[0]   0x432dc50/V- RooConstVar::0.4 = 0.4
[0]   0x43a73b0/V- RooConstVar::0.2 = 0.2
RooProdPdf end partial integral cache
  0x7f86075217d0/V- RooRealSumPdf::sb = 0.0204808 [Auto,Dirty] 
    0x7f860f0c3730/V- RooHistPdf::sgn1 = 0.038578 [Auto,Dirty] 
      0x7f860f0c3040/V- RooRealVar::e = 50
    0x7f8607521000/V- RooRealVar::f1 = 0.4
    0x7f8607540000/V- RooHistPdf::sgn2 = 0.008908 [Auto,Dirty] 
      0x7f860f0c3040/V- RooRealVar::e = 50
    0x7f86075213e8/V- RooRealVar::f2 = 0.2
    0x7f860752c580/V- PiecewiseInterpolation::bkg_sys = 0.00817 [Auto,Clean] 
      0x7f8607540888/V- RooHistPdf::bkg_0 = 0.00817 [Auto,Dirty] 
        0x7f860f0c3040/V- RooRealVar::e = 50
      0x7f860752c000/V- RooHistPdf::bkg_m = 0.0093732 [Auto,Dirty] 
        0x7f860f0c3040/V- RooRealVar::e = 50
      0x7f8607535308/V- RooHistPdf::bkg_p = 0.006664 [Auto,Dirty] 
        0x7f860f0c3040/V- RooRealVar::e = 50
      0x7f8607535b90/V- RooRealVar::alpha = 0
  0x7f860751a000/V- RooGaussian::gaus_alpha = 1 [Auto,Dirty] 
    0x7f8607535b90/V- RooRealVar::alpha = 0
    0x41143f0/V- RooConstVar::0 = 0
    0x40a8930/V- RooConstVar::1 = 1
  0x7f860751a528/V- RooMultiVarGaussian::c_Vf = 1 [Auto,Dirty] 
    0x7f8607521000/V- RooRealVar::f1 = 0.4
    0x7f86075213e8/V- RooRealVar::f2 = 0.2
    0x432dc50/V- RooConstVar::0.4 = 0.4
    0x43a73b0/V- RooConstVar::0.2 = 0.2

Print2

root [70] model1.Print("T")
0x7f8607456000 RooProdPdf::model1 = 0.0204808 [Auto,Dirty] 
RooProdPdf begin partial integral cache
[0] 0x7f86075217d0 RooRealSumPdf::sb = 0.0204808 [Auto,Dirty] 
[0]   0x7f860f0c3730/V- RooHistPdf::sgn1 = 0.038578 [Auto,Dirty] 
[0]     0x7f860f0c3040/V- RooRealVar::e = 50
[0]   0x7f8607521000/V- RooRealVar::f1 = 0.4
[0]   0x7f8607540000/V- RooHistPdf::sgn2 = 0.008908 [Auto,Dirty] 
[0]     0x7f860f0c3040/V- RooRealVar::e = 50
[0]   0x7f86075213e8/V- RooRealVar::f2 = 0.2
[0]   0x7f860752c580/V- PiecewiseInterpolation::bkg_sys = 0.00817 [Auto,Clean] 
[0]     0x7f8607540888/V- RooHistPdf::bkg_0 = 0.00817 [Auto,Dirty] 
[0]       0x7f860f0c3040/V- RooRealVar::e = 50
[0]     0x7f860752c000/V- RooHistPdf::bkg_m = 0.0093732 [Auto,Dirty] 
[0]       0x7f860f0c3040/V- RooRealVar::e = 50
[0]     0x7f8607535308/V- RooHistPdf::bkg_p = 0.006664 [Auto,Dirty] 
[0]       0x7f860f0c3040/V- RooRealVar::e = 50
[0]     0x7f8607535b90/V- RooRealVar::alpha = 0
[0] 0x7f860751a000 RooGaussian::gaus_alpha = 1 [Auto,Dirty] 
[0]   0x7f8607535b90/V- RooRealVar::alpha = 0
[0]   0x41143f0/V- RooConstVar::0 = 0
[0]   0x40a8930/V- RooConstVar::1 = 1
[0] 0x7f860751a528 RooMultiVarGaussian::c_Vf = 1 [Auto,Dirty] 
[0]   0x7f8607521000/V- RooRealVar::f1 = 0.4
[0]   0x7f86075213e8/V- RooRealVar::f2 = 0.2
[0]   0x432dc50/V- RooConstVar::0.4 = 0.4
[0]   0x43a73b0/V- RooConstVar::0.2 = 0.2
[0] 0x7f860745f658 RooGenericPdf::gpdf = 1 [Auto,Dirty] 
[0]   0x7f8607521000/V- RooRealVar::f1 = 0.4
[0]   0x7f86075213e8/V- RooRealVar::f2 = 0.2
RooProdPdf end partial integral cache
  0x7f86075217d0/V- RooRealSumPdf::sb = 0.0204808 [Auto,Dirty] 
    0x7f860f0c3730/V- RooHistPdf::sgn1 = 0.038578 [Auto,Dirty] 
      0x7f860f0c3040/V- RooRealVar::e = 50
    0x7f8607521000/V- RooRealVar::f1 = 0.4
    0x7f8607540000/V- RooHistPdf::sgn2 = 0.008908 [Auto,Dirty] 
      0x7f860f0c3040/V- RooRealVar::e = 50
    0x7f86075213e8/V- RooRealVar::f2 = 0.2
    0x7f860752c580/V- PiecewiseInterpolation::bkg_sys = 0.00817 [Auto,Clean] 
      0x7f8607540888/V- RooHistPdf::bkg_0 = 0.00817 [Auto,Dirty] 
        0x7f860f0c3040/V- RooRealVar::e = 50
      0x7f860752c000/V- RooHistPdf::bkg_m = 0.0093732 [Auto,Dirty] 
        0x7f860f0c3040/V- RooRealVar::e = 50
      0x7f8607535308/V- RooHistPdf::bkg_p = 0.006664 [Auto,Dirty] 
        0x7f860f0c3040/V- RooRealVar::e = 50
      0x7f8607535b90/V- RooRealVar::alpha = 0
  0x7f860751a000/V- RooGaussian::gaus_alpha = 1 [Auto,Dirty] 
    0x7f8607535b90/V- RooRealVar::alpha = 0
    0x41143f0/V- RooConstVar::0 = 0
    0x40a8930/V- RooConstVar::1 = 1
  0x7f860751a528/V- RooMultiVarGaussian::c_Vf = 1 [Auto,Dirty] 
    0x7f8607521000/V- RooRealVar::f1 = 0.4
    0x7f86075213e8/V- RooRealVar::f2 = 0.2
    0x432dc50/V- RooConstVar::0.4 = 0.4
    0x43a73b0/V- RooConstVar::0.2 = 0.2
  0x7f860745f658/V- RooGenericPdf::gpdf = 1 [Auto,Dirty] 
    0x7f8607521000/V- RooRealVar::f1 = 0.4
    0x7f86075213e8/V- RooRealVar::f2 = 0.2

test_toy_log.txt (68.8 KB)

StephanH · September 22, 2020, 4:38pm

That’s the problem, indeed.
Try this instead:
1. + (x[0] + x[1] < 1)
It’s not zero, and you still get a step function.

Antonio83 · September 23, 2020, 11:20am

Hi @StephanH, thanks. Also this is not working. Unfortunatelly it seems there is no simple solution.

StephanH · September 23, 2020, 1:54pm

Hi @Antonio83,

so it’s time for a manual intervention!

First, generate datasets. generate(). Set keepGenData to true.
Inspect the datasets. You can access each dataset using genData()

TList datasets;
for (...) {
  RooAbsData* data = mcstudy->genData(i);
  // Find bad values
  if (good)
    datasets.Add(data);
}

Analyse: mcStudy->fit(datasets.GetEntries(), datasets);

It could be (I kind of expect) that the RooMCStudy deletes the datasets once you give it new data. You would see a crash if that happens. In that case, use datasets.Add(data->Clone());.

Antonio83 · September 24, 2020, 10:45am

Hi @StephanH,

thanks for the indications. This is working: I get rid of the datasets with unphysical values.

The only thing left is a difference in the fit behaviour between the two approaches: in order to make a direct comparison, I set the standard deviations for the fractions f1 and f2 such that I don’t get unphysical generated values.
The fit is not performing well with the “manual intervention method” (first datasets generation and then fit):

as an example, the printout from one of the toys:

  1) RooRealVar::         alpha = -0.393071 +/- 0.245488
  2) RooRealVar::            f1 = 0.452111 +/- 0.0170834
  3) RooRealVar::            f2 = 0.250193 +/- 0.0164342
  4) RooRealVar::           NLL = 8766.69
  5) RooRealVar::          ngen = 2000
  6) RooRealVar:: alpha_gen_gen = -1.24332 +/- 0.0662702
  7) RooRealVar::    f1_gen_gen = 0.274836 +/- 0.00609008
  8) RooRealVar::    f2_gen_gen = 0.208048 +/- 0.00514129
  9) RooRealVar::      alphaerr = 0.245488
 10) RooRealVar::     alphapull = -1.60119
 11) RooRealVar::         f1err = 0.0170834
 12) RooRealVar::        f1pull = 3.05038
 13) RooRealVar::         f2err = 0.0164342
 14) RooRealVar::        f2pull = 3.05419

  RooFitResult: minimized FCN value: 8766.69, estimated distance to minimum: 3.04657e-05
                covariance matrix quality: Full, accurate covariance matrix
                Status : MINIMIZE=0 HESSE=0 

    Floating Parameter  InitialValue    FinalValue +/-  Error     GblCorr.
  --------------------  ------------  --------------------------  --------
                 alpha    0.0000e+00   -3.9307e-01 +/-  2.45e-01  <none>
                    f1    4.0000e-01    4.5211e-01 +/-  1.71e-02  <none>
                    f2    2.0000e-01    2.5019e-01 +/-  1.64e-02  <none>

While in the standard case (generateAndFit) everything seems fine:

the printout for the same toy:

  1) RooRealVar::     alpha = -1.41352 +/- 0.151962
  2) RooRealVar::        f1 = 0.253322 +/- 0.0182998
  3) RooRealVar::        f2 = 0.201049 +/- 0.0178636
  4) RooRealVar::       NLL = 9041.95
  5) RooRealVar::      ngen = 2000
  6) RooRealVar:: alpha_gen = -1.24332 +/- 0.0662702
  7) RooRealVar::    f1_gen = 0.274836 +/- 0.00609008
  8) RooRealVar::    f2_gen = 0.208048 +/- 0.00514129
  9) RooRealVar::  alphaerr = 0.151962
 10) RooRealVar:: alphapull = -1.11997
 11) RooRealVar::     f1err = 0.0182998
 12) RooRealVar::    f1pull = -1.17567
 13) RooRealVar::     f2err = 0.0178636
 14) RooRealVar::    f2pull = -0.391852

  RooFitResult: minimized FCN value: 9041.95, estimated distance to minimum: 5.04434e-09
                covariance matrix quality: Full, accurate covariance matrix
                Status : MINIMIZE=0 HESSE=0 

    Floating Parameter  InitialValue    FinalValue +/-  Error     GblCorr.
  --------------------  ------------  --------------------------  --------
                 alpha    0.0000e+00   -1.4135e+00 +/-  1.52e-01  <none>
                    f1    4.0000e-01    2.5332e-01 +/-  1.83e-02  <none>
                    f2    2.0000e-01    2.0105e-01 +/-  1.79e-02  <none>

I would expect no differences, since the model is the same and initial values for fitting do not change.

Trying to investigate, I find a weird stuff in the log files: the fitted parameters for toy t for the “manual intervention method” correspond to generated parameter for the toy N-t in the “standard method” (for instance, the printout reported above corresponds to toy 2 in the manual case and toy 497 in the standard case…). I attach the log files for reference.
I don’t know what is happening. Maybe something related to the merging of the information when performing:

mcs.generate()
mcs.fit()

I attach the piece of code I’m using for the manual intervention:

	// generate and store data
	int n_toys(500);
	mcs.generate(n_toys, 2000, kTRUE);

	// get the dataset containing the true parameters used for data generation
	const RooDataSet* genData = mcs.genParDataSet();

	// loop on generated data and get rid of unphysical ones (i.e. f1 + f2 > 1)
	TList datasets;
	double alpha_gen(0), f1_gen(0), f2_gen(0);
	for(auto t=0; t<n_toys; t++)
	{
		RooAbsData* data = mcs.genData(t);

		alpha_gen = genData->get(t)->getRealValue("alpha_gen");
		f1_gen = genData->get(t)->getRealValue("f1_gen");
		f2_gen = genData->get(t)->getRealValue("f2_gen");

		bool good = f1_gen + f2_gen < 1. ? true : false;

		if(!good)
			cout <<"BAD GEN for toy " << t << ": f1_gen = " << f1_gen << " - f2_gen = " << f2_gen << endl;

		if(good)
		       datasets.Add(data->Clone());

	}

	mcs.fit(datasets.GetEntries(), datasets);

	// print info on generated and fitted data for each toy
	double f1pull(0), f2pull(0);
	for(auto t=0; t<datasets.GetEntries(); t++)
	{
		cout << "\nN TOY = " << t << "\n" << endl;

		f1pull = mcs.fitParDataSet().get(t)->getRealValue("f1pull");
		f2pull = mcs.fitParDataSet().get(t)->getRealValue("f2pull");

		if( f1pull > 50 || f2pull > 50 )
			cout << "\nPROBLEM\n" << endl;

		mcs.fitParDataSet().get(t)->Print("V");
		mcs.fitResult(t)->Print("V");
	}

Thanks

test_manual_intervention.txt (710.9 KB)
test_standard_generateAndFit.txt (721.2 KB)

Antonio83 · September 25, 2020, 9:42am

Forgot to mention that I am using root 6.20.00.

By printing out the fitParDataSet and fitResult in reverse order w.r.t. the genParDataSet I actually find out that the fit parameters at position t correspond to generated data at position N-t (N is the total number of toys). Example of a toy:

DEBUG: print fitParDataSet & fitResult in reverse order

N TOY = 0

fitParDataSet:
  1) RooRealVar::         alpha = 0.387928 +/- 0.188344
  2) RooRealVar::            f1 = 0.246226 +/- 0.0176824
  3) RooRealVar::            f2 = 0.157976 +/- 0.0152403
  4) RooRealVar::           NLL = 9014.16
  5) RooRealVar::          ngen = 2000
  6) RooRealVar:: alpha_gen_gen = -0.160338 +/- 0.0662702
  7) RooRealVar::    f1_gen_gen = 0.282513 +/- 0.00609008
  8) RooRealVar::    f2_gen_gen = 0.166534 +/- 0.00514129
  9) RooRealVar::      alphaerr = 0.188344
 10) RooRealVar::     alphapull = 2.05968
 11) RooRealVar::         f1err = 0.0176824
 12) RooRealVar::        f1pull = -8.69644
 13) RooRealVar::         f2err = 0.0152403
 14) RooRealVar::        f2pull = -2.7574
genData:
  1) RooRealVar:: alpha_gen_gen = 0.457955 +/- 0.0662702
  2) RooRealVar::    f1_gen_gen = 0.245561 +/- 0.00609008
  3) RooRealVar::    f2_gen_gen = 0.159054 +/- 0.00514129
fitResult:

  RooFitResult: minimized FCN value: 9014.16, estimated distance to minimum: 1.22794e-07
                covariance matrix quality: Full, accurate covariance matrix
                Status : MINIMIZE=0 HESSE=0 

    Floating Parameter  InitialValue    FinalValue +/-  Error     GblCorr.
  --------------------  ------------  --------------------------  --------
                 alpha    0.0000e+00    3.8793e-01 +/-  1.88e-01  <none>
                    f1    4.0000e-01    2.4623e-01 +/-  1.77e-02  <none>
                    f2    2.0000e-01    1.5798e-01 +/-  1.52e-02  <none>

This is done with:

        for(auto t=0; t<datasets.GetEntries(); t++)
        {
                cout << "\nN TOY = " << t << "\n" << endl;

                cout << "fitParDataSet:" <<endl;
                mcs.fitParDataSet().get(n_toys-t-1)->Print("V");

                cout << "genData:" <<endl;
                genData->get(t)->Print("V");

                cout << "fitResult:" <<endl;
                mcs.fitResult(n_toys-t-1)->Print("V");
        }

By creating the TList in reverse order:

        TList datasets;
        double alpha_gen(0), f1_gen(0), f2_gen(0);
        for(auto t=0; t<n_toys; t++)
        {
                RooAbsData* data = mcs.genData(n_toys-t-1);

                alpha_gen = genData->get(t)->getRealValue("alpha_gen");
                f1_gen = genData->get(t)->getRealValue("f1_gen");
                f2_gen = genData->get(t)->getRealValue("f2_gen");

                bool good = f1_gen + f2_gen < 1. ? true : false;

                if(!good)
                        cout <<"BAD GEN for toy " << t << ": f1_gen = " << f1_gen << " - f2_gen = " << f2_gen << endl;

                if(good)
                      datasets.Add(data->Clone());

        }

        mcs.fit(datasets.GetEntries(), datasets);

I get the correct match between generated data and fitted one (in the following printout the fitParDataSet and fitResult are not in reverse order w.r.t. the genParDataSet), but the pulls are wrong:

N TOY = 0

fitParDataSet:
  1) RooRealVar::         alpha = 0.387928 +/- 0.188344
  2) RooRealVar::            f1 = 0.246226 +/- 0.0176824
  3) RooRealVar::            f2 = 0.157976 +/- 0.0152403
  4) RooRealVar::           NLL = 9014.16
  5) RooRealVar::          ngen = 2000
  6) RooRealVar:: alpha_gen_gen = 0.457955 +/- 0.0662702
  7) RooRealVar::    f1_gen_gen = 0.245561 +/- 0.00609008
  8) RooRealVar::    f2_gen_gen = 0.159054 +/- 0.00514129
  9) RooRealVar::      alphaerr = 0.188344
 10) RooRealVar::     alphapull = 2.05968
 11) RooRealVar::         f1err = 0.0176824
 12) RooRealVar::        f1pull = -8.69644
 13) RooRealVar::         f2err = 0.0152403
 14) RooRealVar::        f2pull = -2.7574
genData:
  1) RooRealVar:: alpha_gen_gen = 0.457955 +/- 0.0662702
  2) RooRealVar::    f1_gen_gen = 0.245561 +/- 0.00609008
  3) RooRealVar::    f2_gen_gen = 0.159054 +/- 0.00514129
fitResult:

  RooFitResult: minimized FCN value: 9014.16, estimated distance to minimum: 1.22794e-07
                covariance matrix quality: Full, accurate covariance matrix
                Status : MINIMIZE=0 HESSE=0 

    Floating Parameter  InitialValue    FinalValue +/-  Error     GblCorr.
  --------------------  ------------  --------------------------  --------
                 alpha    0.0000e+00    3.8793e-01 +/-  1.88e-01  <none>
                    f1    4.0000e-01    2.4623e-01 +/-  1.77e-02  <none>
                    f2    2.0000e-01    1.5798e-01 +/-  1.52e-02  <none>

StephanH · September 25, 2020, 2:06pm

You are right: The loop in RooMCStudy runs backwards:

github.com

root-project/root/blob/ee4e822e3413938b740850db68ab2221fd19744d/roofit/roofitcore/src/RooMCStudy.cxx#L480-L483



while(nSamples--) {
  
  if (nSamples%prescale==0) {

So it’s not a surprise that you have to invert the order. Is this enough information or do you need something more?

Antonio83 · September 25, 2020, 2:44pm

Thanks for checking @StephanH.

So I can just create the TList of datasets to feed the mcs.fit() in reverse order. The only problem left is the wrong values of pulls:

Reverse order of datasets: gen and fitted parameters match, pulls wrong

N TOY = 0

fitParDataSet:
  1) RooRealVar::         alpha = 0.387928 +/- 0.188344
  2) RooRealVar::            f1 = 0.246226 +/- 0.0176824
  3) RooRealVar::            f2 = 0.157976 +/- 0.0152403
  4) RooRealVar::           NLL = 9014.16
  5) RooRealVar::          ngen = 2000
  6) RooRealVar:: alpha_gen_gen = 0.457955 +/- 0.0662702
  7) RooRealVar::    f1_gen_gen = 0.245561 +/- 0.00609008
  8) RooRealVar::    f2_gen_gen = 0.159054 +/- 0.00514129
  9) RooRealVar::      alphaerr = 0.188344
 10) RooRealVar::     alphapull = 2.05968
 11) RooRealVar::         f1err = 0.0176824
 12) RooRealVar::        f1pull = -8.69644
 13) RooRealVar::         f2err = 0.0152403
 14) RooRealVar::        f2pull = -2.7574
genData:
  1) RooRealVar:: alpha_gen_gen = 0.457955 +/- 0.0662702
  2) RooRealVar::    f1_gen_gen = 0.245561 +/- 0.00609008
  3) RooRealVar::    f2_gen_gen = 0.159054 +/- 0.00514129
fitResult:

  RooFitResult: minimized FCN value: 9014.16, estimated distance to minimum: 1.22794e-07
                covariance matrix quality: Full, accurate covariance matrix
                Status : MINIMIZE=0 HESSE=0 

    Floating Parameter  InitialValue    FinalValue +/-  Error     GblCorr.
  --------------------  ------------  --------------------------  --------
                 alpha    0.0000e+00    3.8793e-01 +/-  1.88e-01  <none>
                    f1    4.0000e-01    2.4623e-01 +/-  1.77e-02  <none>
                    f2    2.0000e-01    1.5798e-01 +/-  1.52e-02  <none>

They should be:

alphapull = -0.37180372
f1pull = 0.037608017
f2pull = -0.070733516

which is the case using the standard mcs.generateAndFit() method.

Antonio83 · September 25, 2020, 2:58pm

I just discovered that to compute the pulls the initialization values for the parameters are used, and not the actual gen values:

alphapull = (0.387928 - 0)/0.188344 = 2.05968
f1pull = (0.246226 - 0.4)/0.0176824 = -8.69644
f2pull = (0.157976 - 0.2)/0.0152403 = -2.7574

Antonio83 · September 25, 2020, 3:13pm

Maybe the problem is related to the gen variable name, which is varname_gen_gen instead of varname_gen, while the code expects varname_gen for the pull computation.

But this varname_gen_gen is assigned automatically.

Sorry for spamming.

system · October 9, 2020, 3:13pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.