Binned maximum likelihood fits (RooFit) with sharp peaks

Hi,

I remember a presentation by a RooFit developer (whose name I don’t remember, sorry) about a known problem in binned maximum likelihood fits on histograms with sharp peaks. I am unable to find the Indico page for this presentation. I tried googling about this problem to no avail as well. I would like to understand this so can a developer please explain this problem in a little more detail? I am using ROOT v6.18.

I also remember a mention about this problem being solved in ROOT v6.20 release of RooFit?

Thanks for spending some time on this problem!

Cheers,
Da Yu

Hi @Da_Yu_Tou,

you might be talking about the LCHb weekly. I don’t have the link, since the agenda is not public.

Anyway, it’s not solved in root 6.20, but it’s on a (long :slight_smile: ) todo list.

Cheers,
Stephan

Hi @StephanH,

I guess it was you :slight_smile:. I encountered a small bias in my binned fits that was solved by increasing the number of bins.

Can you please give more details about this problem?
-> The origin of this problem.
-> The name of this problem if it is a widely known stats/math/compute problem. A link to a page about this problem would be appreciated.
-> Known solutions. Does increasing the number of bins solve this? What is the solution you plan to implement in RooFit?

My physics analysis is dealing with this problem. Understanding it would assure us that we are not biasing our results.

Cheers,
Da Yu

The origin is that when computing the likelihood of a PDF in a bin, the PDF is evaluated in the bin centre and multiplied with the bin width to get the “total probability” in the bin. If the function is sharply peaked, that can be a bit inaccurate. I’m not aware of a name for such a problem. “Sampling accuracy” maybe?

-> Known solutions. Does increasing the number of bins solve this? What is the solution you plan to implement in RooFit?

Yes, using finer bins reduces that inaccuracy.
A remedy could be to compute multiple probabilities in each bin.

Thanks for the clarification @StephanH.

Out of curiosity, I naively assume you are going to implement an interpolation and sample the bins over the interpolated points in RooFit?

Also, can you confirm running binned or unbinned fits does not affect the integration? It was one of our suspicion but after careful thought the integration is handled by RooAbsPdf and it’s derived classes (at least to my understanding).

Yes, hopefully. I have to check if that can be done without too many hacks.

I’m not aware of problems with integrals in binned fits. For binned PDFs, one has to watch out a bit because default integration strategy for an “unknown” PDF is to run a numeric integral. That’s not really efficient for a binned PDF. Therefore, a special integrator (the RooBinIntegrator) is used, which sums over bins.

For you information there is a JIRA item tracking this issue:

https://sft.its.cern.ch/jira/browse/ROOT-3874

It will be probably be fixed with the on-going a redesign of the negative log-likelihood function in RooFit…
As a workaround , use more bins , or if you can convert your model to a TF1 ROOT object and your data to an histogram, use the ROOT TH1::Fit function.

Lorenzo

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.