RooAddPdf and Integration over Conditional Observables

Dear ROOTers,

I’ve seemingly encountered a problem with how RooFit handles normalization of RooAddPdf, when its constituent PDFs have conditional observables. I’ve encountered several issues, but they seem to be very closely related, so I’ll just list them here. Both the signal and background PDFs throughout this post advertise analytical integrals.

Seems that similar problems have been discussed on the forum before, but I couldn’t find any solutions. If anyone could shed some light on this I’d be very grateful.

Fitting

When one uses a named range, i.e., x.setRange("the_range", 0, 1) and RooFit::Range("the_range") in the fitTo call:

  • Fitting a Signal PDF with conditional variables works fine
  • Fitting a Background PDF with or without conditional works fine
  • Fitting RooAddPdf made from Signal PDF + Background PDF apparently uses numerical integration over the conditional variables (no idea how it could even do that, when it doesn’t know the distribution of the conditional variables)

This is not just some random message - in my case when not using a named range the fit converged within seconds, with the named range (same numerical range) it didn’t converge even after 100 hours (forgot I left it running).

Without the named range, it is working fine.

Generation and Plotting

This is true with or without a named range:

  • Signal PDF is generated (or plotted) fine
  • Background PDF is generated (or plotted) fine
  • When generating (or plotting) RooAddPdf made from Signal PDF + Background PDF, numerical integration of the conditional variables seems to be done

What I mean by “PDF is generated” is that Toy MC is generated from the PDF.

This makes me unable to generate a toy dataset or plot a complex PDF in any reasonable time as this numerical integration seems to be taking forever with my 15 conditional observables. Plus it definitely seems wrong to “integrate” the conditional variables.

Minimal Working Example

Includes printf statements to easily locate relevant ouput lines.

#include "RooAddPdf.h"
#include "RooArgList.h"
#include "RooArgSet.h"
#include "RooDataSet.h"
#include "RooExponential.h"
#include "RooGaussian.h"
#include "RooLandau.h"
#include "RooPlot.h"
#include "RooRealVar.h"
#include "TCanvas.h"

void mymwe() {
    RooRealVar x("x", "x", 0, 1);
    x.setRange("the_range", 0, 1);

    // Signal Gaussian with with per-event sigma
    RooRealVar sg_mu("sg_mu", "S_{#mu}", 0.5, 0, 1);
    RooRealVar sg_sigma("sg_sigma", "S_{#sigma}", 0, 1);
    RooGaussian sg_pdf("sg_pdf", "Signal PDF", x, sg_mu, sg_sigma);

    // Fixed background exponential
    RooRealVar bg_alpha("bg_alpha", "B_{#alpha}", -0.5);
    RooExponential bg_pdf("bg_pdf", "Background PDF", x, bg_alpha);

    // Signal + background PDF
    RooRealVar sg_f("sg_f", "Signal fraction", 0.66);
    RooAddPdf all_pdf("all_pdf", "Signal + background PDF", sg_pdf, bg_pdf, sg_f);

    const int num_signal = 5000;
    const int num_background = 2500;

    // Landau "resolution" PDF; used only for generation of the conditional observable
    RooRealVar res_mu("res_mu", "R_{#mu}", 0.3);
    RooRealVar res_sigma("res_sigma", "R_{#sigma}", 0.2);
    RooLandau res_pdf("res_pdf", "Resolution PDF", sg_sigma, res_mu, res_sigma);
    RooDataSet* data_resolution = res_pdf.generate(sg_sigma, num_signal);

    RooArgList observables(x);
    RooArgList conditional_observables(sg_sigma);

    printf("################ Starting signal generation\n");
    RooDataSet* data_signal = sg_pdf.generate(observables, RooFit::ProtoData(*data_resolution));
    printf("################ Starting background generation\n");
    RooDataSet* data_background = bg_pdf.generate(observables, num_background);
    printf("################ Starting full generation\n");
    RooDataSet* data_all = all_pdf.generate(observables, RooFit::ProtoData(*data_resolution));
    printf("################ Data generated\n");

    printf("################ Signal fit start\n");
    // sg_pdf.fitTo(*data_signal, RooFit::ConditionalObservables(conditional_observables));
    sg_pdf.fitTo(*data_signal, RooFit::ConditionalObservables(conditional_observables),
                 RooFit::Range("the_range"));

    printf("################ Background fit start\n");
    bg_pdf.fitTo(*data_background, RooFit::ConditionalObservables(conditional_observables));

    printf("################ Fit start\n");
    // all_pdf.fitTo(*data_all, RooFit::ConditionalObservables(conditional_observables));
    all_pdf.fitTo(*data_all, RooFit::ConditionalObservables(conditional_observables),
                  RooFit::Range("the_range"));

    RooPlot* frame_signal = x.frame(RooFit::Title("signal"));
    RooPlot* frame_background = x.frame(RooFit::Title("background"));
    RooPlot* frame_all = x.frame(RooFit::Title("signal + background"));

    data_signal->plotOn(frame_signal);
    printf("################ Signal data plotted\n");
    // sg_pdf.plotOn(frame_signal, RooFit::ProjWData(conditional_observables, *data_signal));
    sg_pdf.plotOn(frame_signal, RooFit::ProjWData(conditional_observables, *data_signal),
                  RooFit::NormRange("the_range"));
    printf("\n################ Signal PDF plotted\n");

    data_background->plotOn(frame_background);
    printf("################ Background data plotted\n");
    bg_pdf.plotOn(frame_background);
    printf("################ Background PDF plotted\n");

    data_all->plotOn(frame_all);
    printf("################ Data plotted");
    // all_pdf.plotOn(frame_all, RooFit::ProjWData(conditional_observables, *data_signal));
    all_pdf.plotOn(frame_all, RooFit::ProjWData(conditional_observables, *data_signal),
                   RooFit::NormRange("the_range"));
    printf("\n################ PDF plotted\n");

    TCanvas* c = new TCanvas("c", "c", 1200, 400);
    c->Divide(3, 1);
    c->cd(1);
    frame_signal->Draw();
    c->cd(2);
    frame_background->Draw();
    c->cd(3);
    frame_all->Draw();
}

The output I’m getting from running this (either through the cling interpreter or after compiling with gcc) is the following:

RooFit v3.60 -- Developed by Wouter Verkerke and David Kirkby 
                Copyright (C) 2000-2013 NIKHEF, University of California & Stanford University
                All rights reserved, please read http://roofit.sourceforge.net/license.txt

[#1] INFO:Eval -- RooRealVar::setRange(x) new range named 'the_range' created with bounds [0,1]
################ Starting signal generation
################ Starting background generation
################ Starting full generation
[#1] INFO:NumericIntegration -- RooRealIntegral::init(sg_pdf_Int[sg_sigma,x]) using numeric integrator RooIntegrator1D to calculate Int(sg_sigma)
[#1] INFO:NumericIntegration -- RooRealIntegral::init(sg_pdf_Int[sg_sigma,x]) using numeric integrator RooIntegrator1D to calculate Int(sg_sigma)
[#1] INFO:NumericIntegration -- RooRealIntegral::init(sg_pdf_Int[sg_sigma,x|FULL_RANGE_ADDGENCONTEXT]_Norm[sg_sigma,x]) using numeric integrator RooIntegrator1D to calculate Int(sg_sigma)
################ Data generated
################ Signal fit start
[#1] INFO:Fitting -- RooAbsOptTestStatistic::ctor(nll_sg_pdf_sg_pdfData) constructing test statistic for sub-range named the_range
[#1] INFO:Eval -- RooRealVar::setRange(x) new range named 'NormalizationRangeForthe_range' created with bounds [0,1]
[#1] INFO:Eval -- RooRealVar::setRange(x) new range named 'fit_nll_sg_pdf_sg_pdfData' created with bounds [0,1]
[#1] INFO:Eval -- RooRealVar::setRange(sg_sigma) new range named 'NormalizationRangeForthe_range' created with bounds [0,1]
[#1] INFO:Eval -- RooRealVar::setRange(sg_sigma) new range named 'fit_nll_sg_pdf_sg_pdfData' created with bounds [0,1]
[#1] INFO:Fitting -- RooAbsOptTestStatistic::ctor(nll_sg_pdf_sg_pdfData) fixing interpretation of coefficients of any RooAddPdf to full domain of observables 
[#1] INFO:Minization -- RooMinuit::optimizeConst: activating const optimization
 **********
 **   13 **MIGRAD         500           1
 **********
 FIRST CALL TO USER FUNCTION AT NEW START POINT, WITH IFLAG=4.
 START MIGRAD MINIMIZATION.  STRATEGY  1.  CONVERGENCE WHEN EDM .LT. 1.00e-03
 FCN=-1293.8 FROM MIGRAD    STATUS=INITIATE        4 CALLS           5 TOTAL
                     EDM= unknown      STRATEGY= 1      NO ERROR MATRIX       
  EXT PARAMETER               CURRENT GUESS       STEP         FIRST   
  NO.   NAME      VALUE            ERROR          SIZE      DERIVATIVE 
   1  sg_mu        5.00000e-01   1.00000e-01   2.01358e-01  -2.18543e+03
                               ERR DEF= 0.5
 MIGRAD MINIMIZATION HAS CONVERGED.
 MIGRAD WILL VERIFY CONVERGENCE AND ERROR MATRIX.
 COVARIANCE MATRIX CALCULATED SUCCESSFULLY
 FCN=-1294.04 FROM MIGRAD    STATUS=CONVERGED      14 CALLS          15 TOTAL
                     EDM=5.44596e-16    STRATEGY= 1      ERROR MATRIX ACCURATE 
  EXT PARAMETER                                   STEP         FIRST   
  NO.   NAME      VALUE            ERROR          SIZE      DERIVATIVE 
   1  sg_mu        5.00112e-01   1.59743e-04   7.93770e-06   7.30442e-05
                               ERR DEF= 0.5
 EXTERNAL ERROR MATRIX.    NDIM=  25    NPAR=  1    ERR DEF=0.5
  2.552e-08 
 **********
 **   18 **HESSE         500
 **********
 COVARIANCE MATRIX CALCULATED SUCCESSFULLY
 FCN=-1294.04 FROM HESSE     STATUS=OK              5 CALLS          20 TOTAL
                     EDM=5.44596e-16    STRATEGY= 1      ERROR MATRIX ACCURATE 
  EXT PARAMETER                                INTERNAL      INTERNAL  
  NO.   NAME      VALUE            ERROR       STEP SIZE       VALUE   
   1  sg_mu        5.00112e-01   1.59743e-04   1.58754e-06   2.23069e-04
                               ERR DEF= 0.5
 EXTERNAL ERROR MATRIX.    NDIM=  25    NPAR=  1    ERR DEF=0.5
  2.552e-08 
[#1] INFO:Minization -- RooMinuit::optimizeConst: deactivating const optimization
################ Background fit start
[#1] INFO:Minization -- RooMinuit::optimizeConst: activating const optimization
[#1] INFO:Minization --  The following expressions have been identified as constant and will be precalculated and cached: (bg_pdf)
[#1] INFO:Minization -- RooMinuit::optimizeConst: deactivating const optimization
################ Fit start
[#1] INFO:Fitting -- RooAbsOptTestStatistic::ctor(nll_all_pdf_all_pdfData) constructing test statistic for sub-range named the_range
[#1] INFO:Eval -- RooRealVar::setRange(x) new range named 'fit_nll_all_pdf_all_pdfData' created with bounds [0,1]
[#1] INFO:Eval -- RooRealVar::setRange(sg_sigma) new range named 'fit_nll_all_pdf_all_pdfData' created with bounds [0,1]
[#1] INFO:Fitting -- RooAbsOptTestStatistic::ctor(nll_all_pdf_all_pdfData) fixing interpretation of coefficients of any RooAddPdf to full domain of observables 
[#1] INFO:NumericIntegration -- RooRealIntegral::init(sg_pdf_Int[sg_sigma,x]) using numeric integrator RooIntegrator1D to calculate Int(sg_sigma)
[#1] INFO:Minization -- RooMinuit::optimizeConst: activating const optimization
[#1] INFO:Minization --  The following expressions have been identified as constant and will be precalculated and cached: (bg_pdf)
[#1] INFO:Minization --  The following expressions will be evaluated in cache-and-track mode: (sg_pdf)
 **********
 **   13 **MIGRAD         500           1
 **********
 FIRST CALL TO USER FUNCTION AT NEW START POINT, WITH IFLAG=4.
 START MIGRAD MINIMIZATION.  STRATEGY  1.  CONVERGENCE WHEN EDM .LT. 1.00e-03
 FCN=-174.792 FROM MIGRAD    STATUS=INITIATE        4 CALLS           5 TOTAL
                     EDM= unknown      STRATEGY= 1      NO ERROR MATRIX       
  EXT PARAMETER               CURRENT GUESS       STEP         FIRST   
  NO.   NAME      VALUE            ERROR          SIZE      DERIVATIVE 
   1  sg_mu        5.00112e-01   1.59743e-04   3.19486e-04   6.10893e+01
                               ERR DEF= 0.5
 MIGRAD MINIMIZATION HAS CONVERGED.
 MIGRAD WILL VERIFY CONVERGENCE AND ERROR MATRIX.
 COVARIANCE MATRIX CALCULATED SUCCESSFULLY
 FCN=-174.993 FROM MIGRAD    STATUS=CONVERGED      12 CALLS          13 TOTAL
                     EDM=2.75394e-05    STRATEGY= 1      ERROR MATRIX ACCURATE 
  EXT PARAMETER                                   STEP         FIRST   
  NO.   NAME      VALUE            ERROR          SIZE      DERIVATIVE 
   1  sg_mu        4.96807e-01   5.16209e-03   9.51452e-05  -5.08283e-01
                               ERR DEF= 0.5
 EXTERNAL ERROR MATRIX.    NDIM=  25    NPAR=  1    ERR DEF=0.5
  2.665e-05 
 **********
 **   18 **HESSE         500
 **********
 COVARIANCE MATRIX CALCULATED SUCCESSFULLY
 FCN=-174.993 FROM HESSE     STATUS=OK              5 CALLS          18 TOTAL
                     EDM=2.75393e-05    STRATEGY= 1      ERROR MATRIX ACCURATE 
  EXT PARAMETER                                INTERNAL      INTERNAL  
  NO.   NAME      VALUE            ERROR       STEP SIZE       VALUE   
   1  sg_mu        4.96807e-01   5.16209e-03   1.90290e-05  -6.38554e-03
                               ERR DEF= 0.5
 EXTERNAL ERROR MATRIX.    NDIM=  25    NPAR=  1    ERR DEF=0.5
  2.665e-05 
[#1] INFO:Minization -- RooMinuit::optimizeConst: deactivating const optimization
################ Signal data plotted
[#1] INFO:Plotting -- RooAbsPdf::plotOn(sg_pdf) p.d.f was fitted in range and no explicit plot,norm range was specified, using fit range as default
[#1] INFO:Plotting -- RooAbsPdf::plotOn(sg_pdf) only plotting range 'fit_nll_sg_pdf_sg_pdfData,fit_nll_sg_pdf_sg_pdfData'
[#1] INFO:Plotting -- RooAbsPdf::plotOn(sg_pdf) p.d.f. curve is normalized using explicit choice of ranges 'fit_nll_sg_pdf_sg_pdfData,fit_nll_sg_pdf_sg_pdfData'
[#1] INFO:Plotting -- RooAbsReal::plotOn(sg_pdf) plot on x averages using data variables (sg_sigma)
[#1] INFO:Plotting -- RooAbsReal::plotOn(sg_pdf) only the following components of the projection data will be used: (sg_sigma)
[#1] INFO:Plotting -- RooDataWeightedAverage::ctor(sg_pdfDataWgtAvg) constructing data weighted average of function sg_pdf_Norm[x] over 5000 data points of (sg_sigma) with a total weight of 5000
...........................................................................................................................................................................................................................................................
[#1] INFO:Plotting -- RooAbsReal::plotOn(sg_pdf) plot on x averages using data variables (sg_sigma)
[#1] INFO:Plotting -- RooAbsReal::plotOn(sg_pdf) only the following components of the projection data will be used: (sg_sigma)
[#1] INFO:Plotting -- RooDataWeightedAverage::ctor(sg_pdfDataWgtAvg) constructing data weighted average of function sg_pdf_Norm[x] over 5000 data points of (sg_sigma) with a total weight of 5000
...........................................................................................................................................................................................................................................................
################ Signal PDF plotted
################ Background data plotted
################ Background PDF plotted
################ Data plotted
[#1] INFO:Plotting -- RooAbsPdf::plotOn(all_pdf) p.d.f was fitted in range and no explicit plot,norm range was specified, using fit range as default
[#1] INFO:Plotting -- RooAbsPdf::plotOn(all_pdf) only plotting range 'fit_nll_all_pdf_all_pdfData,fit_nll_all_pdf_all_pdfData'
[#1] INFO:Plotting -- RooAbsPdf::plotOn(all_pdf) p.d.f. curve is normalized using explicit choice of ranges 'fit_nll_all_pdf_all_pdfData,fit_nll_all_pdf_all_pdfData'
[#1] INFO:Plotting -- RooAbsReal::plotOn(all_pdf) plot on x averages using data variables (sg_sigma)
[#1] INFO:Plotting -- RooAbsReal::plotOn(all_pdf) only the following components of the projection data will be used: (sg_sigma)
[#1] INFO:NumericIntegration -- RooRealIntegral::init(sg_pdf_Int[sg_sigma,x]) using numeric integrator RooIntegrator1D to calculate Int(sg_sigma)
[#1] INFO:Plotting -- RooDataWeightedAverage::ctor(all_pdfDataWgtAvg) constructing data weighted average of function all_pdf_Norm[x] over 5000 data points of (sg_sigma) with a total weight of 5000
...............................................................................................................................................................................................................
[#1] INFO:Plotting -- RooAbsReal::plotOn(all_pdf) plot on x averages using data variables (sg_sigma)
[#1] INFO:Plotting -- RooAbsReal::plotOn(all_pdf) only the following components of the projection data will be used: (sg_sigma)
[#1] INFO:NumericIntegration -- RooRealIntegral::init(sg_pdf_Int[sg_sigma,x]) using numeric integrator RooIntegrator1D to calculate Int(sg_sigma)
[#1] INFO:Plotting -- RooDataWeightedAverage::ctor(all_pdfDataWgtAvg) constructing data weighted average of function all_pdf_Norm[x] over 5000 data points of (sg_sigma) with a total weight of 5000
...............................................................................................................................................................................................................
################ PDF plotted

The problematic lines seem to be:

[#1] INFO:NumericIntegration -- RooRealIntegral::init(sg_pdf_Int[sg_sigma,x]) using numeric integrator RooIntegrator1D to calculate Int(sg_sigma)

As noted above, removing the named range fixes this for the fit, but not for generating and plotting.

About the Setup

ROOT 6.05/02 from NLeSC Conda repository

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.