Extended RooAddPdf fit normalization

EtienneDreyer · June 15, 2017, 5:01pm

I am seeing large fit instability related to initial parameter values when using an extended background+signal RooAddPdf.

In the attached minimal working example, the sum of an exponential and a Breit Wigner (shape frozen) is used in an extended fit to some Asimov data it just generated with zero signal. However, while the background-only fit is perfect as expected, allowing the signal normalization to float causes the fit to drift significantly. The extent to which this happens depends on the width of the Breit Wigner signal. You can see this in the pdf I placed on CERNBox (https://cernbox.cern.ch/index.php/s/lPxYCB6pRdLevlk) where I ran the macro in a loop over 30 different signal widths.

It may be that I’ve wrongly configured the normalization of the RooAddPdf and its components, but I can find no problem here. Unfortunately, this bug is making it impossible to validate background models for a data-driven fit because the extracted signal cannot be trusted.

rfWoes.C (7.1 KB)

moneta · June 16, 2017, 2:49pm

Hi

I see you are doing a binned fit. I think the bias you are observing is given to the
current issue of binned fits in RooFit, see

https://sft.its.cern.ch/jira/browse/ROOT-3874

Have you tried increasing the number of bins if this reduces the bias ?
If you would do an unbind fit (that you cannot do on an Asimov data set)
you will not suffer from this problem

Lorenzo

EtienneDreyer · June 16, 2017, 4:51pm

Hi Lorenzo,

Yes, it looks like you’ve identified the problem! To check how the binning affects this behaviour, I again ran the macro in a loop, each time changing the number of bins via ((RooRealVar*)wksp->var("x"))->setBins(Nbins); and plotting the number of negative signal events extracted in the fit:

So the effect seems to disappear in the limit of an unbinned dataset. However, this is still a serious obstacle for validating background models on binned Monte Carlo datasets. Can I hope for this to be fixed soon? Or are you aware of any workaround?

Many thanks!

moneta · June 16, 2017, 8:57pm

Hi,

This is very complicated to fix in RooFit, because as far as I know requires some very big change in they way the pdf are evaluated. The correct solution would be to compute the integral of the pdf in the given bin.
If you would use directly ROOT to do the fit you will not have this problem. The bias is reduced in a standard binned likelihood fit in ROOT and you have the option to compute also the integral in the bin.
You can always convert a RooFit pdf to a ROOT TF1, but you need to be careful in the normalisation and definition of parameters.

Lorenzo

cranmer · June 23, 2017, 2:37pm

So it looks like this issue is related to the way RooFit deals with binned data on a continuous model.

Do you need to do a binned fit? If so, can you use something like HistFactory that is meant for binned fits with monte carlo templates, or do you need to use some functional form for the background and signal?

You could implement your own RooAbsPdf class where you do the integrals in each bin based on the functional form you want. That’s kind of a pain, but it is do-able.

The size of the effect will be reduced if you use smaller bins. You definitely need the bin width to be << signal width so that the bins sample the signal bump many times (particularly near the peak). If you have a signal width ~< bin width it can go wildly wrong like you see in your example.

system · July 7, 2017, 2:37pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.