Use RooFitRanges with a slim peak

JLammering · January 22, 2019, 2:57pm

Dear Experts,
I try to to fit some data, where the signal is a slim gauss and the background is exponential. Furthermore, some data is blinded so I have to use RooFit.Range to not fit this data window. My problem now is that the fit never converges when I state the ranges in the fit. Below is a minimal example where this behavior is also present. I experienced with it that the fit converges when the width of the Gaussian is 14 or above, so maybe it has to do with the step width in the fit.

My Root-version is 6.05/02 built for linuxx8664gcc.

Thanks in advance,
Jasper


# Create observables
x = R.RooRealVar("x", "x", 5100, 6200)
werte = 5000
bkg = 500

left = 5337
right = 5397

# Construct signal pdf
mean = R.RooRealVar("mean", "mean", 5280, 5250, 5300)
sigma = R.RooRealVar("sigma", "sigma", 13, 0, 20)
gx = R.RooGaussian("gx", "gx", x, mean, sigma)
sig_yield = R.RooRealVar('sig_yield', '', werte-bkg, 0, werte)


exp = R.RooRealVar('exp', '', -2e-3, -4e-3, -0.01e-3)
combinatorial = R.RooExponential('combinatorial', '', x, exp)
bkg_yield = R.RooRealVar('bkg_yield', '', bkg, 0, werte)


pdf_gaussWithExp = R.RooAddPdf(
    'pdf_gaussWithExp',
    '',
    R.RooArgList(gx, combinatorial),
    R.RooArgList(sig_yield, bkg_yield)
)

x.setRange("low", 5100, left)
x.setRange("high", right, 6200)
x.setRange("full", 5100, 6200)

data = R.RooDataSet(pdf_gaussWithExp.generate(R.RooArgSet(x), werte))
data_red = data.reduce("x<{}||x>{}".format(left, right))
pdf_gaussWithExp.fitTo(data_red, R.RooFit.Range("low,high"), R.RooFit.NumCPU(4), R.RooFit.Save(True), R.RooFit.Extended())
my_plot = x.frame()

data_red.plotOn(my_plot)
pdf_gaussWithExp.plotOn(my_plot)


can = R.TCanvas()
can.SetLogy()
my_plot.Draw()
can.SaveAs("blind_test_ranges.pdf")```

oshadura · January 22, 2019, 6:16pm

@StephanH perhaps you can help here please?

StephanH · January 23, 2019, 11:39am

Hi Jasper,

it’s not obvious, yet, what the problem is. Could you plot the data and the PDF before + after fit? Could you attach/post logs?

One fact that will prevent the fit from converging is that if e.g. exp should converge to lets say to -0.009E-3, but you force-limited it to a different range.
Maybe try to check that this is ok for all variables.

JLammering · January 23, 2019, 12:19pm

Hi Stephan,

I create the data here with the same pdf that I use to fit afterwards. Therefore the value which the fit delivers should be near the start value but not exactly because of statistics. I draw the conclusion that it doesn’t converges because, for sigma below 14, all parameters are exactly the start value. Below are the plot of data + pdf and the fit results for sigma=13.
blind_test_ranges_after.pdf (23.5 KB)

  NO.   NAME      VALUE            ERROR       STEP SIZE       VALUE   
   1  bkg_yield    5.00000e+02   3.73354e+00   2.38141e-06  -9.27295e-01
   2  exp         -2.00000e-03   1.57090e-13   1.16319e-05   2.50627e-03
   3  mean         5.28000e+03   5.23869e-10   5.10000e-05   2.01358e-01
   4  sig_yield    4.50000e+03   1.22794e-06   2.95086e-06   9.27295e-01
   5  sigma        1.30000e+01   2.81108e-12   1.45289e-07   3.04693e-01```

StephanH · January 23, 2019, 4:38pm

The fit in the plot you attached looks perfect, but I noticed something you missed from the Minuit output. Look at the topmost line of the parameter table:

 COVARIANCE MATRIX CALCULATED SUCCESSFULLY
 FCN=25018.5 FROM HESSE     STATUS=OK             10 CALLS          43 TOTAL
                     EDM=5.80384e-07    STRATEGY= 1      ERROR MATRIX ACCURATE 
  EXT PARAMETER                                INTERNAL      INTERNAL  
  NO.   NAME      VALUE            ERROR       STEP SIZE       VALUE   
   1  mean         1.01746e+00   3.00144e-02   6.58691e-05   1.01922e-01
   2  sigma        2.97870e+00   2.19217e-02   2.12845e-05  -4.31732e-01
                               ERR DEF= 0.5
 EXTERNAL ERROR MATRIX.    NDIM=  25    NPAR=  2    ERR DEF=0.5
  9.009e-04  1.792e-05 
  1.792e-05  4.806e-04 
 PARAMETER  CORRELATION COEFFICIENTS  
       NO.  GLOBAL      1      2
        1  0.02723   1.000  0.027
        2  0.02723   0.027  1.000

It has a column for internal and plain parameter values. These two are different in your case. That suggests that limiting the range of a parameter creates the problem.

Would you like to try removing range limits?
Does Minuit say whether or not the fit converges?
What is the rest of the Minuit output?
Look at the covariance matrix. If there are strong correlations, you may have degenerate fit parameters.

JLammering · January 24, 2019, 12:29pm

Here is the (very long) terminal output:
terminal_output.txt (161.2 KB)

The fit has obviously some issues. It says something about the Hesse-matrix not being pos-def.
To your question about the fit ranges: I think that the fit results should be near the start parameter, because the data is generated with this start parameter and the ranges are around the start parameter. Also the result does not run into one of the ranges but stays at the start value.
Maybe the output brings some light into this.

StephanH · January 24, 2019, 3:54pm

This is your problem:

[#0] WARNING:Minization -- RooMinuitGlue: Minimized function has error status.
Returning maximum FCN so far (-1e+30) to force MIGRAD to back out of this region. Error log follows
Parameter values: bkg_yield=500, exp=-0.002, mean=5280, sig_yield=4500, sigma=13
PID25434/RooRealMPFE::nll_pdf_gaussWithExp_pdf_gaussWithExpData_high_6dc1270_MPFE3[ arg=nll_pdf_gaussWithExp_pdf_gaussWithExpData_high_GOF3 vars=(bkg_yield,exp,mean,sig_yield,sigma) ]
     p.d.f normalization integral is zero or negative @ x=x=5650, mean=mean=5280, sigma=sigma=13

The reason this happens is that when the normalisation integral is computed in the range high, the integral over the Gaussian evaluates to zero, because it’s too narrow. When this happens, RooFit signals the minimiser that it should back out from this region, even though in this case it’s the right region.
You can work around that for now by using the switch R.RooFit.EvalErrorWall(False). It will still warn, but Minuit will not try to leave the region.

This is what I get now:

 COVARIANCE MATRIX CALCULATED SUCCESSFULLY
 FCN=-14888 FROM HESSE     STATUS=OK             31 CALLS         128 TOTAL
                     EDM=8.51689e-07    STRATEGY= 1      ERROR MATRIX ACCURATE 
  EXT PARAMETER                                INTERNAL      INTERNAL  
  NO.   NAME      VALUE            ERROR       STEP SIZE       VALUE   
   1  bkg_yield    5.34659e+02   2.64766e+01   5.70233e-05  -9.04533e-01
   2  exp         -2.48058e-03   1.66149e-04   2.87011e-04  -2.40702e-01
   3  mean         5.27990e+03   2.01182e-01   2.76587e-05   1.97407e-01
   4  sig_yield    4.47032e+03   6.75659e+01   1.47594e-04   9.07764e-01
   5  sigma        1.30740e+01   1.50112e-01   5.28056e-05   3.12460e-01
                               ERR DEF= 0.5

Does this look better?

We will implement a fix for the Gaussian integrals to keep them well behaved.

StephanH · January 28, 2019, 4:16pm

Hi Jasper,

We have a fix. If you want to try it, it will be built into an experimental root tonight. If you have linux with cvmfs, try

/cvmfs/sft-nightlies.cern.ch/lcg/contrib/gentoo/linux/x86_64/startprefix

and start using root. Your fit should work now.

(Little disclaimer:
We don’t officially support the gentoo prefix, but let me know, nevertheless, if it doesn’t work.)

JLammering · January 29, 2019, 12:04pm

Hi Stephan,

I tried your workaround R.RooFit.EvalErrorWall(False) and it works well with my posted pdf combination. Thank you for that.
I tried also the experimental root-session but unfortunately it does not change the behavior of the fit (it only works with the EvalErrorWall statement).
Another problem which I run into is when the pdf consists out of a double gaussian as signal and exponential as combinatorial like here:

mean = R.RooRealVar('mean', '', 5280, 5270, 5290)
width_1 = R.RooRealVar('width_1', '', 4, 3, 5)
width_2 = R.RooRealVar('width_2', '', 8, 7, 9)
sig1 = R.RooGaussian('sig1', '', x, mean, width_1)
sig2 = R.RooGaussian('sig2', '', x, mean, width_2)
sig1frac = R.RooRealVar("sig1frac", "fraction of component 1 in signal", 0.8)
sig = R.RooAddPdf("sig", "Signal", R.RooArgList(sig1, sig2), R.RooArgList(sig1frac))
sig_yield = R.RooRealVar('sig_yield', '', werte - bkg, 0, werte)


exp = R.RooRealVar('exp', '', -2e-3, -4e-3, -0.01e-3)
combinatorial = R.RooExponential('combinatorial', '', x, exp)
bkg_yield = R.RooRealVar('bkg_yield', '', bkg, 0, werte)


pdf_gaussWithExp = R.RooAddPdf(
    'pdf_gaussWithExp',
    '',
    R.RooArgList(sig, combinatorial),
    R.RooArgList(sig_yield, bkg_yield)
)

.
Then the fit consumes really much time and all parameters are nones in the end. Here not even the EvalErrorWall statement helps.

StephanH · January 29, 2019, 1:45pm

Hi,

not your fault. The nightly build didn’t install into CVMFS. It will arrive in a couple of hours. When it arrives (root -b should show Jan 29 instead of Jan 27), it should work. Also your second example seems to work (note, though, that I had to paste it into the first example you posted because the code you posted is incomplete). Although there are just three points in the Gaussian, which is not very much to measure three parameters, the fit seems to converge to a reasonable point.

JLammering · January 29, 2019, 2:27pm

Ok good, then I try it tomorrow.

For the double gaussian I made a typo but corrected it now. Here is the full script with the double gaussian which gives nans:

import ROOT as R

# Create observables
x = R.RooRealVar("x", "x", 5100, 6200)
werte = 5000
bkg = 500

left = 5337
right = 5397

# Construct signal pdf
mean = R.RooRealVar('mean', '', 5280, 5270, 5290)
width_1 = R.RooRealVar('width_1', '', 4, 3, 5)
width_2 = R.RooRealVar('width_2', '', 8, 7, 9)
sig1 = R.RooGaussian('sig1', '', x, mean, width_1)
sig2 = R.RooGaussian('sig2', '', x, mean, width_2)
sig1frac = R.RooRealVar("sig1frac", "fraction of component 1 in signal", 0.8)
sig = R.RooAddPdf("sig", "Signal", R.RooArgList(sig1, sig2), R.RooArgList(sig1frac))
sig_yield = R.RooRealVar('sig_yield', '', werte - bkg, 0, werte)


exp = R.RooRealVar('exp', '', -2e-3, -4e-3, -0.01e-3)
combinatorial = R.RooExponential('combinatorial', '', x, exp)
bkg_yield = R.RooRealVar('bkg_yield', '', bkg, 0, werte)


pdf_gaussWithExp = R.RooAddPdf(
    'pdf_gaussWithExp',
    '',
    R.RooArgList(sig, combinatorial),
    R.RooArgList(sig_yield, bkg_yield)
)

x.setRange("low", 5100, left)
x.setRange("high", right, 6200)
x.setRange("full", 5100, 6200)

data = R.RooDataSet(pdf_gaussWithExp.generate(R.RooArgSet(x), werte))
data_red = data.reduce("x<{}||x>{}".format(left, right))

pdf_gaussWithExp.fitTo(data_red, R.RooFit.Range("low,high"), R.RooFit.NumCPU(4), R.RooFit.Save(True), R.RooFit.Extended(), R.RooFit.EvalErrorWall(False))
my_plot = x.frame()

data_red.plotOn(my_plot)
pdf_gaussWithExp.plotOn(my_plot)


can = R.TCanvas()
can.SetLogy()
my_plot.Draw()

can.SaveAs("blind_test_ranges_after.pdf")

JLammering · January 31, 2019, 10:03am

Hi Stephan,

Thank you very much for the fix in the new root-version. Now my example works fine, also with the double gaussian.

Kind regards,
Jasper

system · February 14, 2019, 10:12am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.