I am working on a ToyMC fitting test, as such I have been investigating the quality of fits. I am using a fourth order Chebychev polynomial and a gaussian for the signal and I noticed some structure in the location of fitted signal. I investigated this further and have found that a straight background fit on ‘expected’ data produces a fourth order discrepancy between the data and the fit (looking at the residuals), furthermore if no fit is done there is still a discrepancy between the data and the line despite perfect agreement in variables. This is seen to increase linearly with number of data points produced.
I realise this is a small effect but when searching for small resonances in data very small signals are required to be found so even small errors are important to minimise. Is there any understanding of what causes these discrepancies and any way to avoid them?
I have condensed my code to the minimum required to see the effect, attached and example plots of the effect are attached.
Thanks for any help
FittedResidual.pdf (31.7 KB)
NotFittedResidual.pdf (31.4 KB)
FitSample2.cpp (2.13 KB)
Sorry for the delay.
My guess here is that RooFit is not performing an integral of the continuous function within the bin to get the PDF for the entry in the binned dataset. The normalization of the entire PDF is an integral, but when evaluating the PDF itself it simply evaluates the continuous function at the bin center. So if there is curvature in the bin, it leads to a small bias.
The real solution to this is to have RooFit perform the integral within the bin to get the PDF value, but this has not yet been implemented. Wouter was suggesting an approximate fix, which was to use the sum of the bin centers * bin width as the normalization of the pdf. That will at least help with the bias, but it’s not exact.
This effect is not present if you use unbinned data. (The expected data is always binned).
The size of the effect should also be reduced with smaller bin sizes.
On the short term, one could create a new PDF class that wraps a continuous function and produces a binned PDF. That will require some familiarity with RooFit, but give you the right answer even with large bin widths.