RooFit PDF from TH1

oxon · June 22, 2019, 2:29am

I have a few questions about a quick short RooFit script below.

Test script
This script generates two Gaussian with the same shape. One with kN1 entries is used as a model PDF, and the other with kN2 entries is fitted with the model PDF.

void pdf(const int kN1, const int kN2) {
  const double kXmin = -3;
  const double kXmax =  3;
  const int kNbins = 100;

  TH1D* h1 = new TH1D("h1", "h1", kNbins, kXmin, kXmax);
  h1->FillRandom("gaus", kN1);
  TH1D* h2 = new TH1D("h2", "h2", kNbins, kXmin, kXmax);
  h2->FillRandom("gaus", kN2);

  RooRealVar x("x", "x", kXmin, kXmax);
  RooRealVar norm("norm", "normalization", 0.5 * kN2, 0., 1.5 * kN2);

  RooDataHist dh1("dh1", "dh1", x, h1);
  RooDataHist dh2("dh2", "dh2", x, h2);

  RooHistPdf pdf("pdf", "pdf", x, dh1);
  RooAddPdf model("model", "model", RooArgList(pdf), RooArgList(norm));

  model.fitTo(dh2);

  RooPlot* xframe = x.frame(RooFit::Title("Gaussian Fit Test"));
  dh2.plotOn(xframe);
  model.plotOn(xframe);
  xframe->Draw();
}

Q1.

When I execute the script repeatedly, the best-fit normalization parameter norm always converges to 9.99986e+03 +/- 9.99948e+01, whereas the both Gaussian histograms are regenerated every time and thus the best-fit value should fluctuate accordingly.

Why do I always get the same best-fit value?

$ root
root [0] .x pdf.C(10000, 10000)

RooFit v3.60 -- Developed by Wouter Verkerke and David Kirkby 
                Copyright (C) 2000-2013 NIKHEF, University of California & Stanford University
                All rights reserved, please read http://roofit.sourceforge.net/license.txt

[#1] INFO:DataHandling -- RooDataHist::adjustBinning(dh1): fit range of variable x expanded to nearest bin boundaries: [-3,3] --> [-3,3]
[#1] INFO:DataHandling -- RooDataHist::adjustBinning(dh2): fit range of variable x expanded to nearest bin boundaries: [-3,3] --> [-3,3]
[#1] INFO:Minization -- p.d.f. provides expected number of events, including extended term in likelihood.
[#1] INFO:Minization -- RooMinimizer::optimizeConst: activating const optimization
[#1] INFO:Minization --  The following expressions have been identified as constant and will be precalculated and cached: (pdf)
 **********
 **    1 **SET PRINT           1
 **********
 **********
 **    2 **SET NOGRAD
 **********
 PARAMETER DEFINITIONS:
    NO.   NAME         VALUE      STEP SIZE      LIMITS
     1 norm         5.00000e+03  1.50000e+03    0.00000e+00  1.50000e+04
 **********
 **    3 **SET ERR         0.5
 **********
 **********
 **    4 **SET PRINT           1
 **********
 **********
 **    5 **SET STR           1
 **********
 NOW USING STRATEGY  1: TRY TO BALANCE SPEED AGAINST RELIABILITY
 **********
 **    6 **MIGRAD         500           1
 **********
 FIRST CALL TO USER FUNCTION AT NEW START POINT, WITH IFLAG=4.
 START MIGRAD MINIMIZATION.  STRATEGY  1.  CONVERGENCE WHEN EDM .LT. 1.00e-03
 FCN=-66095 FROM MIGRAD    STATUS=INITIATE        4 CALLS           5 TOTAL
                     EDM= unknown      STRATEGY= 1      NO ERROR MATRIX       
  EXT PARAMETER               CURRENT GUESS       STEP         FIRST   
  NO.   NAME      VALUE            ERROR          SIZE      DERIVATIVE 
   1  norm         5.00000e+03   1.50000e+03   2.14402e-01  -7.07110e+03
                               ERR DEF= 0.5
 MIGRAD MINIMIZATION HAS CONVERGED.
 MIGRAD WILL VERIFY CONVERGENCE AND ERROR MATRIX.
 COVARIANCE MATRIX CALCULATED SUCCESSFULLY
 FCN=-68026.5 FROM MIGRAD    STATUS=CONVERGED      26 CALLS          27 TOTAL
                     EDM=2.09874e-06    STRATEGY= 1      ERROR MATRIX ACCURATE 
  EXT PARAMETER                                   STEP         FIRST   
  NO.   NAME      VALUE            ERROR          SIZE      DERIVATIVE 
   1  norm         9.99986e+03   9.99948e+01   2.54498e-03  -1.02441e-01
                               ERR DEF= 0.5
 EXTERNAL ERROR MATRIX.    NDIM=  25    NPAR=  1    ERR DEF=0.5
  1.000e+04 
 **********
 **    7 **SET ERR         0.5
 **********
 **********
 **    8 **SET PRINT           1
 **********
 **********
 **    9 **HESSE         500
 **********
 COVARIANCE MATRIX CALCULATED SUCCESSFULLY
 FCN=-68026.5 FROM HESSE     STATUS=OK              5 CALLS          32 TOTAL
                     EDM=2.09792e-06    STRATEGY= 1      ERROR MATRIX ACCURATE 
  EXT PARAMETER                                INTERNAL      INTERNAL  
  NO.   NAME      VALUE            ERROR       STEP SIZE       VALUE   
   1  norm         9.99986e+03   9.99949e+01   1.01799e-04   3.39816e-01
                               ERR DEF= 0.5
 EXTERNAL ERROR MATRIX.    NDIM=  25    NPAR=  1    ERR DEF=0.5
  1.000e+04 
[#1] INFO:Minization -- RooMinimizer::optimizeConst: deactivating const optimization
Info in <TCanvas::MakeDefCanvas>:  created default TCanvas with name c1

Q2.
As the number of entries of the data histogram, kN2, is 10000 in the above example, the obtained error estimate 9.99948e+01 is a reasonable value, which should be approximately sqrt(10000). However, the statistical fluctuation in the model PDF looks to be not taken into account in this fit. If it is also considered, the error estimate will be about sqrt(2)*sqrt(10000).

How do I also take into account the statistical fluctuation of the model PDF?

Q3.
When kN1 is small, say, 1000, I often get MIGRAD warning and wrong best-fit normalization result 1.91599e+03 +/- 5.95010e+03 (should be ~10000).

root [1] .x pdf.C(1000, 10000)
(snip)
[#0] WARNING:Minization -- RooMinimizerFcn: Minimized function has error status.
Returning maximum FCN so far (-1e+30) to force MIGRAD to back out of this region. Error log follows
Parameter values: norm=5000
RooAddPdf::model[ norm * pdf ]
     getLogVal() top-level p.d.f evaluates to zero @ !refCoefNorm=(), !pdfs=(pdf = 0/1000), !coefficients=(norm = 5000)
     getLogVal() top-level p.d.f evaluates to zero @ !refCoefNorm=(), !pdfs=(pdf = 0/1000), !coefficients=(norm = 5000)
     getLogVal() top-level p.d.f evaluates to zero @ !refCoefNorm=(), !pdfs=(pdf = 0/1000), !coefficients=(norm = 5000)
     getLogVal() top-level p.d.f evaluates to zero @ !refCoefNorm=(), !pdfs=(pdf = 0/1000), !coefficients=(norm = 5000)
     getLogVal() top-level p.d.f evaluates to zero @ !refCoefNorm=(), !pdfs=(pdf = 0/1000), !coefficients=(norm = 5000)
     getLogVal() top-level p.d.f evaluates to zero @ !refCoefNorm=(), !pdfs=(pdf = 0/1000), !coefficients=(norm = 5000)
     getLogVal() top-level p.d.f evaluates to zero @ !refCoefNorm=(), !pdfs=(pdf = 0/1000), !coefficients=(norm = 5000)
     getLogVal() top-level p.d.f evaluates to zero @ !refCoefNorm=(), !pdfs=(pdf = 0/1000), !coefficients=(norm = 5000)
     getLogVal() top-level p.d.f evaluates to zero @ !refCoefNorm=(), !pdfs=(pdf = 0/1000), !coefficients=(norm = 5000)
     getLogVal() top-level p.d.f evaluates to zero @ !refCoefNorm=(), !pdfs=(pdf = 0/1000), !coefficients=(norm = 5000)
RooNLLVar::nll_model_dh2[ paramSet=(norm) ]
     function value is NAN @ paramSet=(norm = 5000)
(snip)
 MINUIT WARNING IN HESSE
 ============== Second derivative zero for parameter1
  MNHESS FAILS AND WILL RETURN DIAGONAL MATRIX. 
 FCN=-1e+30 FROM HESSE     STATUS=FAILED          3 CALLS          28 TOTAL
                     EDM=0    STRATEGY= 1  ERROR MATRIX UNCERTAINTY 100.0 per cent
  EXT PARAMETER                APPROXIMATE     INTERNAL      INTERNAL  
  NO.   NAME      VALUE            ERROR       STEP SIZE       VALUE   
   1  norm         1.91599e+03   5.95010e+03   0.00000e+00  -3.39837e-01
                               ERR DEF= 0.5
 EXTERNAL ERROR MATRIX.    NDIM=  25    NPAR=  1    ERR DEF=0.5
  5.000e+07 
ERR MATRIX APPROXIMATE

This is probably because the model PDF histogram has a few empty bins, and division by zero happens inside the likelihood calculation, I guess. However, it is likely that a model PDF made from a histogram filled with low-statistic experimental data can have a few empty bins like this example script.

Is there any way to avoid the above MIGRAD warning and wrong minimization?

Q4.
I always get INFO print below.

[#1] INFO:DataHandling -- RooDataHist::adjustBinning(dh1): fit range of variable x expanded to nearest bin boundaries: [-3,3] --> [-3,3]
[#1] INFO:DataHandling -- RooDataHist::adjustBinning(dh2): fit range of variable x expanded to nearest bin boundaries: [-3,3] --> [-3,3]

I do not understand why the variable range is extended from the original one [-3,3] to the exactly same range [-3,3]. Is it OK to ignore this message?

Axel · June 24, 2019, 6:34am

@StephanH can you help here?

StephanH · June 24, 2019, 7:04am

It looks like the random generator might be initialised to the same value. You could try to initialise it manually or you could generate several histograms at a time, and then fit them without re-starting the script. =

For the rest, let me get back to you later.

oxon · June 24, 2019, 7:31am

Thank you for the help.

I added one more argument to change the random seed, but I always see the same fitting result.

$ cat pdf_seed.C          
void pdf_seed(const int kN1, const int kN2, int seed) {
  gRandom->SetSeed(seed);

  const double kXmin = -3;
  const double kXmax =  3;
  const int kNbins = 100;

  TH1D* h1 = new TH1D("h1", "h1", kNbins, kXmin, kXmax);
  h1->FillRandom("gaus", kN1);
  TH1D* h2 = new TH1D("h2", "h2", kNbins, kXmin, kXmax);
  h2->FillRandom("gaus", kN2);

  RooRealVar x("x", "x", kXmin, kXmax);
  RooRealVar norm("norm", "normalization", 0.5 * kN2, 0., 1.5 * kN2);

  RooDataHist dh1("dh1", "dh1", x, h1);
  RooDataHist dh2("dh2", "dh2", x, h2);

  RooHistPdf pdf("pdf", "pdf", x, dh1);
  RooAddPdf model("model", "model", RooArgList(pdf), RooArgList(norm));

  model.fitTo(dh2);

  RooPlot* xframe = x.frame(RooFit::Title("Gaussian Fit Test"));
  dh2.plotOn(xframe);
  model.plotOn(xframe);
  xframe->Draw();
}

root [1] .x pdf_seed.C(100000, 10000, 0)
Warning in <TROOT::Append>: Replacing existing TH1: h1 (Potential memory leak).
Warning in <TROOT::Append>: Replacing existing TH1: h2 (Potential memory leak).
[#1] INFO:DataHandling -- RooDataHist::adjustBinning(dh1): fit range of variable x expanded to nearest bin boundaries: [-3,3] --> [-3,3]
[#1] INFO:DataHandling -- RooDataHist::adjustBinning(dh2): fit range of variable x expanded to nearest bin boundaries: [-3,3] --> [-3,3]
[#1] INFO:Minization -- p.d.f. provides expected number of events, including extended term in likelihood.
[#1] INFO:Minization -- RooMinimizer::optimizeConst: activating const optimization
[#1] INFO:Minization --  The following expressions have been identified as constant and will be precalculated and cached: (pdf)
 **********
 **   10 **SET PRINT           1
 **********
 **********
 **   11 **SET NOGRAD
 **********
 PARAMETER DEFINITIONS:
    NO.   NAME         VALUE      STEP SIZE      LIMITS
     1 norm         5.00000e+03  1.50000e+03    0.00000e+00  1.50000e+04
 **********
 **   12 **SET ERR         0.5
 **********
 **********
 **   13 **SET PRINT           1
 **********
 **********
 **   14 **SET STR           1
 **********
 NOW USING STRATEGY  1: TRY TO BALANCE SPEED AGAINST RELIABILITY
 **********
 **   15 **MIGRAD         500           1
 **********
 FIRST CALL TO USER FUNCTION AT NEW START POINT, WITH IFLAG=4.
 START MIGRAD MINIMIZATION.  STRATEGY  1.  CONVERGENCE WHEN EDM .LT. 1.00e-03
 FCN=-66113 FROM MIGRAD    STATUS=INITIATE        4 CALLS           5 TOTAL
                     EDM= unknown      STRATEGY= 1      NO ERROR MATRIX       
  EXT PARAMETER               CURRENT GUESS       STEP         FIRST   
  NO.   NAME      VALUE            ERROR          SIZE      DERIVATIVE 
   1  norm         5.00000e+03   1.50000e+03   2.14402e-01  -7.07114e+03
                               ERR DEF= 0.5
 MIGRAD MINIMIZATION HAS CONVERGED.
 MIGRAD WILL VERIFY CONVERGENCE AND ERROR MATRIX.
 COVARIANCE MATRIX CALCULATED SUCCESSFULLY
 FCN=-68044.4 FROM MIGRAD    STATUS=CONVERGED      26 CALLS          27 TOTAL
                     EDM=2.09878e-06    STRATEGY= 1      ERROR MATRIX ACCURATE 
  EXT PARAMETER                                   STEP         FIRST   
  NO.   NAME      VALUE            ERROR          SIZE      DERIVATIVE 
   1  norm         9.99986e+03   9.99948e+01   2.54531e-03  -1.02442e-01
                               ERR DEF= 0.5
 EXTERNAL ERROR MATRIX.    NDIM=  25    NPAR=  1    ERR DEF=0.5
  1.000e+04 
 **********
 **   16 **SET ERR         0.5
 **********
 **********
 **   17 **SET PRINT           1
 **********
 **********
 **   18 **HESSE         500
 **********
 COVARIANCE MATRIX CALCULATED SUCCESSFULLY
 FCN=-68044.4 FROM HESSE     STATUS=OK              5 CALLS          32 TOTAL
                     EDM=2.09794e-06    STRATEGY= 1      ERROR MATRIX ACCURATE 
  EXT PARAMETER                                INTERNAL      INTERNAL  
  NO.   NAME      VALUE            ERROR       STEP SIZE       VALUE   
   1  norm         9.99986e+03   9.99949e+01   1.01813e-04   3.39816e-01
                               ERR DEF= 0.5
 EXTERNAL ERROR MATRIX.    NDIM=  25    NPAR=  1    ERR DEF=0.5
  1.000e+04 
[#1] INFO:Minization -- RooMinimizer::optimizeConst: deactivating const optimization
root [2] .x pdf_seed.C(100000, 10000, 1)
Warning in <TROOT::Append>: Replacing existing TH1: h1 (Potential memory leak).
Warning in <TROOT::Append>: Replacing existing TH1: h2 (Potential memory leak).
[#1] INFO:DataHandling -- RooDataHist::adjustBinning(dh1): fit range of variable x expanded to nearest bin boundaries: [-3,3] --> [-3,3]
[#1] INFO:DataHandling -- RooDataHist::adjustBinning(dh2): fit range of variable x expanded to nearest bin boundaries: [-3,3] --> [-3,3]
[#1] INFO:Minization -- p.d.f. provides expected number of events, including extended term in likelihood.
[#1] INFO:Minization -- RooMinimizer::optimizeConst: activating const optimization
[#1] INFO:Minization --  The following expressions have been identified as constant and will be precalculated and cached: (pdf)
 **********
 **   19 **SET PRINT           1
 **********
 **********
 **   20 **SET NOGRAD
 **********
 PARAMETER DEFINITIONS:
    NO.   NAME         VALUE      STEP SIZE      LIMITS
     1 norm         5.00000e+03  1.50000e+03    0.00000e+00  1.50000e+04
 **********
 **   21 **SET ERR         0.5
 **********
 **********
 **   22 **SET PRINT           1
 **********
 **********
 **   23 **SET STR           1
 **********
 NOW USING STRATEGY  1: TRY TO BALANCE SPEED AGAINST RELIABILITY
 **********
 **   24 **MIGRAD         500           1
 **********
 FIRST CALL TO USER FUNCTION AT NEW START POINT, WITH IFLAG=4.
 START MIGRAD MINIMIZATION.  STRATEGY  1.  CONVERGENCE WHEN EDM .LT. 1.00e-03
 FCN=-66192.7 FROM MIGRAD    STATUS=INITIATE        4 CALLS           5 TOTAL
                     EDM= unknown      STRATEGY= 1      NO ERROR MATRIX       
  EXT PARAMETER               CURRENT GUESS       STEP         FIRST   
  NO.   NAME      VALUE            ERROR          SIZE      DERIVATIVE 
   1  norm         5.00000e+03   1.50000e+03   2.14402e-01  -7.07114e+03
                               ERR DEF= 0.5
 MIGRAD MINIMIZATION HAS CONVERGED.
 MIGRAD WILL VERIFY CONVERGENCE AND ERROR MATRIX.
 COVARIANCE MATRIX CALCULATED SUCCESSFULLY
 FCN=-68124.2 FROM MIGRAD    STATUS=CONVERGED      26 CALLS          27 TOTAL
                     EDM=2.09826e-06    STRATEGY= 1      ERROR MATRIX ACCURATE 
  EXT PARAMETER                                   STEP         FIRST   
  NO.   NAME      VALUE            ERROR          SIZE      DERIVATIVE 
   1  norm         9.99986e+03   9.99948e+01   2.54680e-03  -1.02430e-01
                               ERR DEF= 0.5
 EXTERNAL ERROR MATRIX.    NDIM=  25    NPAR=  1    ERR DEF=0.5
  1.000e+04 
 **********
 **   25 **SET ERR         0.5
 **********
 **********
 **   26 **SET PRINT           1
 **********
 **********
 **   27 **HESSE         500
 **********
 COVARIANCE MATRIX CALCULATED SUCCESSFULLY
 FCN=-68124.2 FROM HESSE     STATUS=OK              5 CALLS          32 TOTAL
                     EDM=2.09742e-06    STRATEGY= 1      ERROR MATRIX ACCURATE 
  EXT PARAMETER                                INTERNAL      INTERNAL  
  NO.   NAME      VALUE            ERROR       STEP SIZE       VALUE   
   1  norm         9.99986e+03   9.99948e+01   1.01872e-04   3.39816e-01
                               ERR DEF= 0.5
 EXTERNAL ERROR MATRIX.    NDIM=  25    NPAR=  1    ERR DEF=0.5
  1.000e+04 
[#1] INFO:Minization -- RooMinimizer::optimizeConst: deactivating const optimization
root [3] .x pdf_seed.C(100000, 10000, 2)
Warning in <TROOT::Append>: Replacing existing TH1: h1 (Potential memory leak).
Warning in <TROOT::Append>: Replacing existing TH1: h2 (Potential memory leak).
[#1] INFO:DataHandling -- RooDataHist::adjustBinning(dh1): fit range of variable x expanded to nearest bin boundaries: [-3,3] --> [-3,3]
[#1] INFO:DataHandling -- RooDataHist::adjustBinning(dh2): fit range of variable x expanded to nearest bin boundaries: [-3,3] --> [-3,3]
[#1] INFO:Minization -- p.d.f. provides expected number of events, including extended term in likelihood.
[#1] INFO:Minization -- RooMinimizer::optimizeConst: activating const optimization
[#1] INFO:Minization --  The following expressions have been identified as constant and will be precalculated and cached: (pdf)
 **********
 **   28 **SET PRINT           1
 **********
 **********
 **   29 **SET NOGRAD
 **********
 PARAMETER DEFINITIONS:
    NO.   NAME         VALUE      STEP SIZE      LIMITS
     1 norm         5.00000e+03  1.50000e+03    0.00000e+00  1.50000e+04
 **********
 **   30 **SET ERR         0.5
 **********
 **********
 **   31 **SET PRINT           1
 **********
 **********
 **   32 **SET STR           1
 **********
 NOW USING STRATEGY  1: TRY TO BALANCE SPEED AGAINST RELIABILITY
 **********
 **   33 **MIGRAD         500           1
 **********
 FIRST CALL TO USER FUNCTION AT NEW START POINT, WITH IFLAG=4.
 START MIGRAD MINIMIZATION.  STRATEGY  1.  CONVERGENCE WHEN EDM .LT. 1.00e-03
 FCN=-65949.4 FROM MIGRAD    STATUS=INITIATE        4 CALLS           5 TOTAL
                     EDM= unknown      STRATEGY= 1      NO ERROR MATRIX       
  EXT PARAMETER               CURRENT GUESS       STEP         FIRST   
  NO.   NAME      VALUE            ERROR          SIZE      DERIVATIVE 
   1  norm         5.00000e+03   1.50000e+03   2.14402e-01  -7.07114e+03
                               ERR DEF= 0.5
 MIGRAD MINIMIZATION HAS CONVERGED.
 MIGRAD WILL VERIFY CONVERGENCE AND ERROR MATRIX.
 COVARIANCE MATRIX CALCULATED SUCCESSFULLY
 FCN=-67880.9 FROM MIGRAD    STATUS=CONVERGED      26 CALLS          27 TOTAL
                     EDM=2.09984e-06    STRATEGY= 1      ERROR MATRIX ACCURATE 
  EXT PARAMETER                                   STEP         FIRST   
  NO.   NAME      VALUE            ERROR          SIZE      DERIVATIVE 
   1  norm         9.99986e+03   9.99948e+01   2.54225e-03  -1.02468e-01
                               ERR DEF= 0.5
 EXTERNAL ERROR MATRIX.    NDIM=  25    NPAR=  1    ERR DEF=0.5
  1.000e+04 
 **********
 **   34 **SET ERR         0.5
 **********
 **********
 **   35 **SET PRINT           1
 **********
 **********
 **   36 **HESSE         500
 **********
 COVARIANCE MATRIX CALCULATED SUCCESSFULLY
 FCN=-67880.9 FROM HESSE     STATUS=OK              5 CALLS          32 TOTAL
                     EDM=2.09901e-06    STRATEGY= 1      ERROR MATRIX ACCURATE 
  EXT PARAMETER                                INTERNAL      INTERNAL  
  NO.   NAME      VALUE            ERROR       STEP SIZE       VALUE   
   1  norm         9.99986e+03   9.99949e+01   1.01690e-04   3.39816e-01
                               ERR DEF= 0.5
 EXTERNAL ERROR MATRIX.    NDIM=  25    NPAR=  1    ERR DEF=0.5
  1.000e+04 
[#1] INFO:Minization -- RooMinimizer::optimizeConst: deactivating const optimization
root [4] .x pdf_seed.C(100000, 10000, 3)
Warning in <TROOT::Append>: Replacing existing TH1: h1 (Potential memory leak).
Warning in <TROOT::Append>: Replacing existing TH1: h2 (Potential memory leak).
[#1] INFO:DataHandling -- RooDataHist::adjustBinning(dh1): fit range of variable x expanded to nearest bin boundaries: [-3,3] --> [-3,3]
[#1] INFO:DataHandling -- RooDataHist::adjustBinning(dh2): fit range of variable x expanded to nearest bin boundaries: [-3,3] --> [-3,3]
[#1] INFO:Minization -- p.d.f. provides expected number of events, including extended term in likelihood.
[#1] INFO:Minization -- RooMinimizer::optimizeConst: activating const optimization
[#1] INFO:Minization --  The following expressions have been identified as constant and will be precalculated and cached: (pdf)
 **********
 **   37 **SET PRINT           1
 **********
 **********
 **   38 **SET NOGRAD
 **********
 PARAMETER DEFINITIONS:
    NO.   NAME         VALUE      STEP SIZE      LIMITS
     1 norm         5.00000e+03  1.50000e+03    0.00000e+00  1.50000e+04
 **********
 **   39 **SET ERR         0.5
 **********
 **********
 **   40 **SET PRINT           1
 **********
 **********
 **   41 **SET STR           1
 **********
 NOW USING STRATEGY  1: TRY TO BALANCE SPEED AGAINST RELIABILITY
 **********
 **   42 **MIGRAD         500           1
 **********
 FIRST CALL TO USER FUNCTION AT NEW START POINT, WITH IFLAG=4.
 START MIGRAD MINIMIZATION.  STRATEGY  1.  CONVERGENCE WHEN EDM .LT. 1.00e-03
 FCN=-66147.7 FROM MIGRAD    STATUS=INITIATE        4 CALLS           5 TOTAL
                     EDM= unknown      STRATEGY= 1      NO ERROR MATRIX       
  EXT PARAMETER               CURRENT GUESS       STEP         FIRST   
  NO.   NAME      VALUE            ERROR          SIZE      DERIVATIVE 
   1  norm         5.00000e+03   1.50000e+03   2.14402e-01  -7.07114e+03
                               ERR DEF= 0.5
 MIGRAD MINIMIZATION HAS CONVERGED.
 MIGRAD WILL VERIFY CONVERGENCE AND ERROR MATRIX.
 COVARIANCE MATRIX CALCULATED SUCCESSFULLY
 FCN=-68079.2 FROM MIGRAD    STATUS=CONVERGED      26 CALLS          27 TOTAL
                     EDM=2.09855e-06    STRATEGY= 1      ERROR MATRIX ACCURATE 
  EXT PARAMETER                                   STEP         FIRST   
  NO.   NAME      VALUE            ERROR          SIZE      DERIVATIVE 
   1  norm         9.99986e+03   9.99948e+01   2.54596e-03  -1.02437e-01
                               ERR DEF= 0.5
 EXTERNAL ERROR MATRIX.    NDIM=  25    NPAR=  1    ERR DEF=0.5
  1.000e+04 
 **********
 **   43 **SET ERR         0.5
 **********
 **********
 **   44 **SET PRINT           1
 **********
 **********
 **   45 **HESSE         500
 **********
 COVARIANCE MATRIX CALCULATED SUCCESSFULLY
 FCN=-68079.2 FROM HESSE     STATUS=OK              5 CALLS          32 TOTAL
                     EDM=2.09771e-06    STRATEGY= 1      ERROR MATRIX ACCURATE 
  EXT PARAMETER                                INTERNAL      INTERNAL  
  NO.   NAME      VALUE            ERROR       STEP SIZE       VALUE   
   1  norm         9.99986e+03   9.99949e+01   1.01839e-04   3.39816e-01
                               ERR DEF= 0.5
 EXTERNAL ERROR MATRIX.    NDIM=  25    NPAR=  1    ERR DEF=0.5
  1.000e+04 
[#1] INFO:Minization -- RooMinimizer::optimizeConst: deactivating const optimization

oxon · June 24, 2019, 7:39am

Sorry, but I have found my misunderstanding. I thought that the number of entries in h2 would fluctuate because the X range was limited to +/- 3. However, TH1::FillRandom looks to drop underflow/overflow values, and the number of entries of h2 was always 10000 in the above example.

That’s why the best-fit normalization value always converged to the same value because the best-fit normalization in the likelihood method should be the number of entries of the data histogram.

By changing the RooRealVar x region to [kXmin/2., kXmax/2.], the best-fit normalization fluctuates now.

void pdf_seed(const int kN1, const int kN2, int seed) {
  gRandom->SetSeed(seed);

  const double kXmin = -3;
  const double kXmax =  3;
  const int kNbins = 100;

  TH1D* h1 = new TH1D("h1", "h1", kNbins, kXmin, kXmax);
  h1->FillRandom("gaus", kN1);
  TH1D* h2 = new TH1D("h2", "h2", kNbins, kXmin, kXmax);
  h2->FillRandom("gaus", kN2);

  RooRealVar x("x", "x", kXmin/2., kXmax/2.);
  RooRealVar norm("norm", "normalization", 0.5 * kN2, 0., 1.5 * kN2);

  RooDataHist dh1("dh1", "dh1", x, h1);
  RooDataHist dh2("dh2", "dh2", x, h2);

  RooHistPdf pdf("pdf", "pdf", x, dh1);
  RooAddPdf model("model", "model", RooArgList(pdf), RooArgList(norm));

  model.fitTo(dh2);

  RooPlot* xframe = x.frame(RooFit::Title("Gaussian Fit Test"));
  dh2.plotOn(xframe);
  model.plotOn(xframe);
  xframe->Draw();
}

So Q1 has been resolved.

StephanH · June 24, 2019, 7:46am

Statistical uncertainties of your model PDF need extra treatment. They are systematic uncertainties, and they can be taken into account with the Barlow-Beeston method. That is, you add scale factors that can change up/down the bin content for each bin. You also add Poisson constraint terms, which pull the predicted bin content towards the mean value of the prediction.
There’s no example I could find fast.

You are right. If the PDF is zero, it’s assigning zero probability to a data event if it happens to fall into this bin. The problem actually occurs when the logarithm of a zero probability is computed.
You can fix it by manually putting a very small, but non-zero probability.

That’s stupid on RooFit’s side, but yes, you can ignore it as it actually doesn’t change the range.

Wile_E_Coyote · June 24, 2019, 7:54am

For your “Q1.”, if you really want it random, make sure that you use:

gRandom->SetSeed(0);

oxon · June 24, 2019, 11:36pm

It is not relevant to the original question.

oxon · June 24, 2019, 11:40pm

Thank you Stephan. Does RooFit support the Barlow-Beeston method? Do I have to implement it myself? It would be really appreciated if you could point the location to read.

StephanH · June 25, 2019, 6:53am

Hello @oxon,

RooFit supports it only in the sense that it has all tools to implement it. I will try to write a little tutorial to show how it’s done.

oxon · June 25, 2019, 7:45am

Thank you again. I am looking at a few tutorials like

but I have not found a complete working example yet. I would really appreciate a minimum working example, which will be very helpful for other RooFit users.

StephanH · June 26, 2019, 4:26pm

Hello @oxon,

I managed to write a tutorial. It will soon appear in ROOT’s master branch, but to speed things up I post it here for you.
I improved the behaviour of RooHistConstraint a bit, because the old one will generate gamma parameters for empty bins. This will lead to MINUIT complaining about having trouble estimating a covariance matrix. This is because these gamma are effectively dead parameters - they don’t change anything in the likelihood.
If you have all bins populated, that shouldn’t be a problem, though. If you do see such complaints, you can (soon) try with the master version of ROOT if interested.

rf709_BarlowBeeston.C (7.4 KB)

oxon · June 27, 2019, 12:09pm

Thank you so much for the thorough tutorial example. I really appreciate it. I will try it and will get back to you if I have any further question.

oxon · June 28, 2019, 1:05am

If the data has two Gaussian peaks at x1 and x2, which can be modeled with the same signal MC model with statistical fluctuation, can I apply the techniques in your tutorial? My colleague is working on a such script, in which a few Gaussian peaks with the identical shape are horizontally shifted at the same time.

I wonder if the signal MC fluctuation is considered simultaneously in different peaks or the fluctuation is double counted or triple counted.

StephanH · June 28, 2019, 7:05am

It might work, but you have to make sure that you don’t double-count. That is, you need to use the same gammas for both signal peaks if the templates have been made from the same distribution. The implementation I sent to you doesn’t have a notion of such a problem. It just assigns the same gammas to each bin.

system · July 12, 2019, 7:14am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.