Bad fit at boundaries for convoluted RooHistPdf

Dear Rooters,

I would like to fit a RooHistPdf convoluted with a gaussian (using RooFFTConvPdf) to my data. The fit looks quite good, except for high values close to the histogram boundaries, where there is a visible deviation between the fit and the data. A similar problem was reported earlier [1], but changing the minimum/maximum of my variable does not have any effect.
I created a minimal example attached to this post, which will create pseudo data, and a histogram which will be convoluted using an gaussian and fitted to the pseudo data. In the created plots, you will observe the behaviour described above. I would be happy if you have any ideas what is going wrong here.

Cheers,
Knut

[1] Problem with a Breit-Wigner convolved with a Crystal-Ball

tl;dr: Pdf is zero beyond bin boundaries. Define Histogram in increased range, and fit in subrange. See the second attached file.
fitExample_solution.py (1.6 KB)
fitExample.py (1.54 KB)


Hi,

What you see is normal and it is due to the fact you are using a FFT for the convolution. You should not change the range of the variable itself, but the range of the variables used for the convolution. You should increase probably by a larger amount since your pdf is going to zero very slowly on the right side.

Best Regards

Lorenzo

Hi Lorenzo,

could you please explain what you mean with “the variables used for the convolution”? I have only one variable (x), used to create histograms, do the convolution, fit, etc, similar to this tutorial here: https://root.cern.ch/root/htmldoc/tutorials/roofit/rf208_convolution.C.html
Or do you mean increasing the range of mean and sigma of the gauss used for the convolution? This has no effect.

Thanks,
Knut

Hi,

I was referring to the variable you are integrating when doing the convolution, your observable variable x.
Did you increase enough its range for the “cache”, which is used to build the cache for the convolution ?
This is the solution proposed in viewtopic.php?t=19725.

You can also try to use RooFFTConv::setBufferFraction, see the doc at root.cern.ch/doc/master/classRooFFTConvPdf.html

Lorenzo

Hi,

I increased the range of my variable by quite a bit, and also increased the number of bins

x.setBins(100000, "cache")
x.setMin("cache", 50.5)                                                         
x.setMax("cache", 5000.5)

with no change. The default buffer fraction is 10%, but nothing changes if it is changed to 0.1%, 1% or 50%.

Knut

Hi,

Looking better at your plot, I see that in your case are the data points increasing and not the function. this not an effect of the FFT convolution, it is some other physical effect in the data that you are probably not taking into account in your model. I think Roofit cannot help much here, apart from describing the correct model

Lorenzo

Hi,

the data histogram is generated with an gaussian() + landau() random value, so there shouldn’t be any physics beyond the standard model/statistics, as you can see from the attachment in my first post.

Cheers,
Knut

Hi,

If you can post both the code generating the data and then the subsequent fit in RooFIt, I can have a closer look at the problem.
Also, if you use the numerical convolution, do you get a perfect fit ?

Lorenzo

Hi,

I thought I posted it in the beginning, but it seems it didn’t work :frowning: It should be attached to this post, and if not, here is the code anyway:

[code]#!/usr/bin/env python2
import ROOT
ROOT.gROOT.SetBatch()

get shape (used to fit), a simple landau

hShape = ROOT.TH1F(“landau”, “”, 120, 60, 120)
for i in range(10000):
hShape.Fill(ROOT.gRandom.Landau(91, 2.5))

get data, laundau convolutet with a gausian

hData = ROOT.TH1F(“landauGaus”, “”, 120, 60, 120)
for i in range(1000000):
hData.Fill(ROOT.gRandom.Landau(91, 2.5) + ROOT.gRandom.Gaus(0, 3))

x = ROOT.RooRealVar(“x”, “x”, 60, 120)

x.setBins(10000, “cache”)

taken from Problem with a Breit-Wigner convolved with a Crystal-Ball

x.setMin(“cache”, 50.5)
x.setMax(“cache”, 130.5)

data is breit-wigner convoluted with a gaussian, taken from histogram

dhData = ROOT.RooDataHist(“dhSig”, “”, ROOT.RooArgList(x), ROOT.RooFit.Import(hData))

take the background shape from a histogram (this time breit-wigner only)

dhShape = ROOT.RooDataHist(“dhShape”, “”, ROOT.RooArgList(x), ROOT.RooFit.Import(hShape))
pdfShape = ROOT.RooHistPdf(“pdfShape”, “”, ROOT.RooArgSet(x), dhShape, 0)

define gausian to smear background shape

mean = ROOT.RooRealVar(“mean”, “mean”, 0, -5, 5)
width = ROOT.RooRealVar(“width”, “width”, 2, 0, 10)
smearGaus = ROOT.RooGaussian(“smearGaus”, “”, x, mean, width)

smear our background histogram with an gaussian

smearedShape = ROOT.RooFFTConvPdf(“smearedShape”,"", x, pdfShape, smearGaus)

fit the result

smearedShape.fitTo(dhData)

and draw all

frame = x.frame(ROOT.RooFit.Title(" "))
dhData.plotOn(frame)
smearedShape.plotOn(frame)
c = ROOT.TCanvas()
frame.Draw()
ROOT.gPad.SaveAs(“fitExample.pdf”)
ROOT.gPad.SaveAs(“fitExample.png”)
[/code]
fitExample.py (1.54 KB)

Hi,

The problem is not the convolution itself but the binned dataset used for fitting. Binned data in RooFit are often biased due to missing the integral calculation in the bins. This is probably causing the effect you are seeing. You can see also that the fitted width of the gaussian is smaller than the generated one.
If you increase the bin numbers of the data set (not of the pdf), the fit is improved

Lorenzo

Hi,

I don’t think the binning of the data is causing this effect. I don’t see any changes if I increase the number of bins of the data. In fact, I also did an unbinned fit (TTree as input), and see the same problem (code attached).
Any other ideas?

Thanks so far for your suggestions
Knut
fitExample_unbinned.py (1.88 KB)

Hi,

You are right, sorry for my wrong conclusion. The problem is indeed in the pdf definition for the signal. Since you are defining it as a RooHistPdf, it is defined as zero outside the fit range. This causes the problem, because you need to extend the FFT convolution outside the range.
For this reason the receipt suggested in the previous post does not work.
You need to define the RooHistPdf to a larger range and then perform the fit only in a sub-range.

Best Regards

Lorenzo

Hi Lorenzo,

you are absolutely right, this solves the problem. I edited my first post and attached the solution. Thanks again for your help!

Best regards
Knut