Dear ROOTers,
I’ve seemingly encountered a problem with how RooFit handles normalization of RooAddPdf
, when its constituent PDFs have conditional observables. I’ve encountered several issues, but they seem to be very closely related, so I’ll just list them here. Both the signal and background PDFs throughout this post advertise analytical integrals.
Seems that similar problems have been discussed on the forum before, but I couldn’t find any solutions. If anyone could shed some light on this I’d be very grateful.
Fitting
When one uses a named range, i.e., x.setRange("the_range", 0, 1)
and RooFit::Range("the_range")
in the fitTo
call:
- Fitting a Signal PDF with conditional variables works fine
- Fitting a Background PDF with or without conditional works fine
- Fitting
RooAddPdf
made from Signal PDF + Background PDF apparently uses numerical integration over the conditional variables (no idea how it could even do that, when it doesn’t know the distribution of the conditional variables)
This is not just some random message - in my case when not using a named range the fit converged within seconds, with the named range (same numerical range) it didn’t converge even after 100 hours (forgot I left it running).
Without the named range, it is working fine.
Generation and Plotting
This is true with or without a named range:
- Signal PDF is generated (or plotted) fine
- Background PDF is generated (or plotted) fine
- When generating (or plotting)
RooAddPdf
made from Signal PDF + Background PDF, numerical integration of the conditional variables seems to be done
What I mean by “PDF is generated” is that Toy MC is generated from the PDF.
This makes me unable to generate a toy dataset or plot a complex PDF in any reasonable time as this numerical integration seems to be taking forever with my 15 conditional observables. Plus it definitely seems wrong to “integrate” the conditional variables.
Minimal Working Example
Includes printf
statements to easily locate relevant ouput lines.
#include "RooAddPdf.h"
#include "RooArgList.h"
#include "RooArgSet.h"
#include "RooDataSet.h"
#include "RooExponential.h"
#include "RooGaussian.h"
#include "RooLandau.h"
#include "RooPlot.h"
#include "RooRealVar.h"
#include "TCanvas.h"
void mymwe() {
RooRealVar x("x", "x", 0, 1);
x.setRange("the_range", 0, 1);
// Signal Gaussian with with per-event sigma
RooRealVar sg_mu("sg_mu", "S_{#mu}", 0.5, 0, 1);
RooRealVar sg_sigma("sg_sigma", "S_{#sigma}", 0, 1);
RooGaussian sg_pdf("sg_pdf", "Signal PDF", x, sg_mu, sg_sigma);
// Fixed background exponential
RooRealVar bg_alpha("bg_alpha", "B_{#alpha}", -0.5);
RooExponential bg_pdf("bg_pdf", "Background PDF", x, bg_alpha);
// Signal + background PDF
RooRealVar sg_f("sg_f", "Signal fraction", 0.66);
RooAddPdf all_pdf("all_pdf", "Signal + background PDF", sg_pdf, bg_pdf, sg_f);
const int num_signal = 5000;
const int num_background = 2500;
// Landau "resolution" PDF; used only for generation of the conditional observable
RooRealVar res_mu("res_mu", "R_{#mu}", 0.3);
RooRealVar res_sigma("res_sigma", "R_{#sigma}", 0.2);
RooLandau res_pdf("res_pdf", "Resolution PDF", sg_sigma, res_mu, res_sigma);
RooDataSet* data_resolution = res_pdf.generate(sg_sigma, num_signal);
RooArgList observables(x);
RooArgList conditional_observables(sg_sigma);
printf("################ Starting signal generation\n");
RooDataSet* data_signal = sg_pdf.generate(observables, RooFit::ProtoData(*data_resolution));
printf("################ Starting background generation\n");
RooDataSet* data_background = bg_pdf.generate(observables, num_background);
printf("################ Starting full generation\n");
RooDataSet* data_all = all_pdf.generate(observables, RooFit::ProtoData(*data_resolution));
printf("################ Data generated\n");
printf("################ Signal fit start\n");
// sg_pdf.fitTo(*data_signal, RooFit::ConditionalObservables(conditional_observables));
sg_pdf.fitTo(*data_signal, RooFit::ConditionalObservables(conditional_observables),
RooFit::Range("the_range"));
printf("################ Background fit start\n");
bg_pdf.fitTo(*data_background, RooFit::ConditionalObservables(conditional_observables));
printf("################ Fit start\n");
// all_pdf.fitTo(*data_all, RooFit::ConditionalObservables(conditional_observables));
all_pdf.fitTo(*data_all, RooFit::ConditionalObservables(conditional_observables),
RooFit::Range("the_range"));
RooPlot* frame_signal = x.frame(RooFit::Title("signal"));
RooPlot* frame_background = x.frame(RooFit::Title("background"));
RooPlot* frame_all = x.frame(RooFit::Title("signal + background"));
data_signal->plotOn(frame_signal);
printf("################ Signal data plotted\n");
// sg_pdf.plotOn(frame_signal, RooFit::ProjWData(conditional_observables, *data_signal));
sg_pdf.plotOn(frame_signal, RooFit::ProjWData(conditional_observables, *data_signal),
RooFit::NormRange("the_range"));
printf("\n################ Signal PDF plotted\n");
data_background->plotOn(frame_background);
printf("################ Background data plotted\n");
bg_pdf.plotOn(frame_background);
printf("################ Background PDF plotted\n");
data_all->plotOn(frame_all);
printf("################ Data plotted");
// all_pdf.plotOn(frame_all, RooFit::ProjWData(conditional_observables, *data_signal));
all_pdf.plotOn(frame_all, RooFit::ProjWData(conditional_observables, *data_signal),
RooFit::NormRange("the_range"));
printf("\n################ PDF plotted\n");
TCanvas* c = new TCanvas("c", "c", 1200, 400);
c->Divide(3, 1);
c->cd(1);
frame_signal->Draw();
c->cd(2);
frame_background->Draw();
c->cd(3);
frame_all->Draw();
}
The output I’m getting from running this (either through the cling interpreter or after compiling with gcc) is the following:
RooFit v3.60 -- Developed by Wouter Verkerke and David Kirkby
Copyright (C) 2000-2013 NIKHEF, University of California & Stanford University
All rights reserved, please read http://roofit.sourceforge.net/license.txt
[#1] INFO:Eval -- RooRealVar::setRange(x) new range named 'the_range' created with bounds [0,1]
################ Starting signal generation
################ Starting background generation
################ Starting full generation
[#1] INFO:NumericIntegration -- RooRealIntegral::init(sg_pdf_Int[sg_sigma,x]) using numeric integrator RooIntegrator1D to calculate Int(sg_sigma)
[#1] INFO:NumericIntegration -- RooRealIntegral::init(sg_pdf_Int[sg_sigma,x]) using numeric integrator RooIntegrator1D to calculate Int(sg_sigma)
[#1] INFO:NumericIntegration -- RooRealIntegral::init(sg_pdf_Int[sg_sigma,x|FULL_RANGE_ADDGENCONTEXT]_Norm[sg_sigma,x]) using numeric integrator RooIntegrator1D to calculate Int(sg_sigma)
################ Data generated
################ Signal fit start
[#1] INFO:Fitting -- RooAbsOptTestStatistic::ctor(nll_sg_pdf_sg_pdfData) constructing test statistic for sub-range named the_range
[#1] INFO:Eval -- RooRealVar::setRange(x) new range named 'NormalizationRangeForthe_range' created with bounds [0,1]
[#1] INFO:Eval -- RooRealVar::setRange(x) new range named 'fit_nll_sg_pdf_sg_pdfData' created with bounds [0,1]
[#1] INFO:Eval -- RooRealVar::setRange(sg_sigma) new range named 'NormalizationRangeForthe_range' created with bounds [0,1]
[#1] INFO:Eval -- RooRealVar::setRange(sg_sigma) new range named 'fit_nll_sg_pdf_sg_pdfData' created with bounds [0,1]
[#1] INFO:Fitting -- RooAbsOptTestStatistic::ctor(nll_sg_pdf_sg_pdfData) fixing interpretation of coefficients of any RooAddPdf to full domain of observables
[#1] INFO:Minization -- RooMinuit::optimizeConst: activating const optimization
**********
** 13 **MIGRAD 500 1
**********
FIRST CALL TO USER FUNCTION AT NEW START POINT, WITH IFLAG=4.
START MIGRAD MINIMIZATION. STRATEGY 1. CONVERGENCE WHEN EDM .LT. 1.00e-03
FCN=-1293.8 FROM MIGRAD STATUS=INITIATE 4 CALLS 5 TOTAL
EDM= unknown STRATEGY= 1 NO ERROR MATRIX
EXT PARAMETER CURRENT GUESS STEP FIRST
NO. NAME VALUE ERROR SIZE DERIVATIVE
1 sg_mu 5.00000e-01 1.00000e-01 2.01358e-01 -2.18543e+03
ERR DEF= 0.5
MIGRAD MINIMIZATION HAS CONVERGED.
MIGRAD WILL VERIFY CONVERGENCE AND ERROR MATRIX.
COVARIANCE MATRIX CALCULATED SUCCESSFULLY
FCN=-1294.04 FROM MIGRAD STATUS=CONVERGED 14 CALLS 15 TOTAL
EDM=5.44596e-16 STRATEGY= 1 ERROR MATRIX ACCURATE
EXT PARAMETER STEP FIRST
NO. NAME VALUE ERROR SIZE DERIVATIVE
1 sg_mu 5.00112e-01 1.59743e-04 7.93770e-06 7.30442e-05
ERR DEF= 0.5
EXTERNAL ERROR MATRIX. NDIM= 25 NPAR= 1 ERR DEF=0.5
2.552e-08
**********
** 18 **HESSE 500
**********
COVARIANCE MATRIX CALCULATED SUCCESSFULLY
FCN=-1294.04 FROM HESSE STATUS=OK 5 CALLS 20 TOTAL
EDM=5.44596e-16 STRATEGY= 1 ERROR MATRIX ACCURATE
EXT PARAMETER INTERNAL INTERNAL
NO. NAME VALUE ERROR STEP SIZE VALUE
1 sg_mu 5.00112e-01 1.59743e-04 1.58754e-06 2.23069e-04
ERR DEF= 0.5
EXTERNAL ERROR MATRIX. NDIM= 25 NPAR= 1 ERR DEF=0.5
2.552e-08
[#1] INFO:Minization -- RooMinuit::optimizeConst: deactivating const optimization
################ Background fit start
[#1] INFO:Minization -- RooMinuit::optimizeConst: activating const optimization
[#1] INFO:Minization -- The following expressions have been identified as constant and will be precalculated and cached: (bg_pdf)
[#1] INFO:Minization -- RooMinuit::optimizeConst: deactivating const optimization
################ Fit start
[#1] INFO:Fitting -- RooAbsOptTestStatistic::ctor(nll_all_pdf_all_pdfData) constructing test statistic for sub-range named the_range
[#1] INFO:Eval -- RooRealVar::setRange(x) new range named 'fit_nll_all_pdf_all_pdfData' created with bounds [0,1]
[#1] INFO:Eval -- RooRealVar::setRange(sg_sigma) new range named 'fit_nll_all_pdf_all_pdfData' created with bounds [0,1]
[#1] INFO:Fitting -- RooAbsOptTestStatistic::ctor(nll_all_pdf_all_pdfData) fixing interpretation of coefficients of any RooAddPdf to full domain of observables
[#1] INFO:NumericIntegration -- RooRealIntegral::init(sg_pdf_Int[sg_sigma,x]) using numeric integrator RooIntegrator1D to calculate Int(sg_sigma)
[#1] INFO:Minization -- RooMinuit::optimizeConst: activating const optimization
[#1] INFO:Minization -- The following expressions have been identified as constant and will be precalculated and cached: (bg_pdf)
[#1] INFO:Minization -- The following expressions will be evaluated in cache-and-track mode: (sg_pdf)
**********
** 13 **MIGRAD 500 1
**********
FIRST CALL TO USER FUNCTION AT NEW START POINT, WITH IFLAG=4.
START MIGRAD MINIMIZATION. STRATEGY 1. CONVERGENCE WHEN EDM .LT. 1.00e-03
FCN=-174.792 FROM MIGRAD STATUS=INITIATE 4 CALLS 5 TOTAL
EDM= unknown STRATEGY= 1 NO ERROR MATRIX
EXT PARAMETER CURRENT GUESS STEP FIRST
NO. NAME VALUE ERROR SIZE DERIVATIVE
1 sg_mu 5.00112e-01 1.59743e-04 3.19486e-04 6.10893e+01
ERR DEF= 0.5
MIGRAD MINIMIZATION HAS CONVERGED.
MIGRAD WILL VERIFY CONVERGENCE AND ERROR MATRIX.
COVARIANCE MATRIX CALCULATED SUCCESSFULLY
FCN=-174.993 FROM MIGRAD STATUS=CONVERGED 12 CALLS 13 TOTAL
EDM=2.75394e-05 STRATEGY= 1 ERROR MATRIX ACCURATE
EXT PARAMETER STEP FIRST
NO. NAME VALUE ERROR SIZE DERIVATIVE
1 sg_mu 4.96807e-01 5.16209e-03 9.51452e-05 -5.08283e-01
ERR DEF= 0.5
EXTERNAL ERROR MATRIX. NDIM= 25 NPAR= 1 ERR DEF=0.5
2.665e-05
**********
** 18 **HESSE 500
**********
COVARIANCE MATRIX CALCULATED SUCCESSFULLY
FCN=-174.993 FROM HESSE STATUS=OK 5 CALLS 18 TOTAL
EDM=2.75393e-05 STRATEGY= 1 ERROR MATRIX ACCURATE
EXT PARAMETER INTERNAL INTERNAL
NO. NAME VALUE ERROR STEP SIZE VALUE
1 sg_mu 4.96807e-01 5.16209e-03 1.90290e-05 -6.38554e-03
ERR DEF= 0.5
EXTERNAL ERROR MATRIX. NDIM= 25 NPAR= 1 ERR DEF=0.5
2.665e-05
[#1] INFO:Minization -- RooMinuit::optimizeConst: deactivating const optimization
################ Signal data plotted
[#1] INFO:Plotting -- RooAbsPdf::plotOn(sg_pdf) p.d.f was fitted in range and no explicit plot,norm range was specified, using fit range as default
[#1] INFO:Plotting -- RooAbsPdf::plotOn(sg_pdf) only plotting range 'fit_nll_sg_pdf_sg_pdfData,fit_nll_sg_pdf_sg_pdfData'
[#1] INFO:Plotting -- RooAbsPdf::plotOn(sg_pdf) p.d.f. curve is normalized using explicit choice of ranges 'fit_nll_sg_pdf_sg_pdfData,fit_nll_sg_pdf_sg_pdfData'
[#1] INFO:Plotting -- RooAbsReal::plotOn(sg_pdf) plot on x averages using data variables (sg_sigma)
[#1] INFO:Plotting -- RooAbsReal::plotOn(sg_pdf) only the following components of the projection data will be used: (sg_sigma)
[#1] INFO:Plotting -- RooDataWeightedAverage::ctor(sg_pdfDataWgtAvg) constructing data weighted average of function sg_pdf_Norm[x] over 5000 data points of (sg_sigma) with a total weight of 5000
...........................................................................................................................................................................................................................................................
[#1] INFO:Plotting -- RooAbsReal::plotOn(sg_pdf) plot on x averages using data variables (sg_sigma)
[#1] INFO:Plotting -- RooAbsReal::plotOn(sg_pdf) only the following components of the projection data will be used: (sg_sigma)
[#1] INFO:Plotting -- RooDataWeightedAverage::ctor(sg_pdfDataWgtAvg) constructing data weighted average of function sg_pdf_Norm[x] over 5000 data points of (sg_sigma) with a total weight of 5000
...........................................................................................................................................................................................................................................................
################ Signal PDF plotted
################ Background data plotted
################ Background PDF plotted
################ Data plotted
[#1] INFO:Plotting -- RooAbsPdf::plotOn(all_pdf) p.d.f was fitted in range and no explicit plot,norm range was specified, using fit range as default
[#1] INFO:Plotting -- RooAbsPdf::plotOn(all_pdf) only plotting range 'fit_nll_all_pdf_all_pdfData,fit_nll_all_pdf_all_pdfData'
[#1] INFO:Plotting -- RooAbsPdf::plotOn(all_pdf) p.d.f. curve is normalized using explicit choice of ranges 'fit_nll_all_pdf_all_pdfData,fit_nll_all_pdf_all_pdfData'
[#1] INFO:Plotting -- RooAbsReal::plotOn(all_pdf) plot on x averages using data variables (sg_sigma)
[#1] INFO:Plotting -- RooAbsReal::plotOn(all_pdf) only the following components of the projection data will be used: (sg_sigma)
[#1] INFO:NumericIntegration -- RooRealIntegral::init(sg_pdf_Int[sg_sigma,x]) using numeric integrator RooIntegrator1D to calculate Int(sg_sigma)
[#1] INFO:Plotting -- RooDataWeightedAverage::ctor(all_pdfDataWgtAvg) constructing data weighted average of function all_pdf_Norm[x] over 5000 data points of (sg_sigma) with a total weight of 5000
...............................................................................................................................................................................................................
[#1] INFO:Plotting -- RooAbsReal::plotOn(all_pdf) plot on x averages using data variables (sg_sigma)
[#1] INFO:Plotting -- RooAbsReal::plotOn(all_pdf) only the following components of the projection data will be used: (sg_sigma)
[#1] INFO:NumericIntegration -- RooRealIntegral::init(sg_pdf_Int[sg_sigma,x]) using numeric integrator RooIntegrator1D to calculate Int(sg_sigma)
[#1] INFO:Plotting -- RooDataWeightedAverage::ctor(all_pdfDataWgtAvg) constructing data weighted average of function all_pdf_Norm[x] over 5000 data points of (sg_sigma) with a total weight of 5000
...............................................................................................................................................................................................................
################ PDF plotted
The problematic lines seem to be:
[#1] INFO:NumericIntegration -- RooRealIntegral::init(sg_pdf_Int[sg_sigma,x]) using numeric integrator RooIntegrator1D to calculate Int(sg_sigma)
As noted above, removing the named range fixes this for the fit, but not for generating and plotting.
About the Setup
ROOT 6.05/02 from NLeSC Conda repository