Binned maximum likelihood fits (RooFit) with sharp peaks

Da_Yu_Tou · March 31, 2020, 2:56pm

Hi,

I remember a presentation by a RooFit developer (whose name I don’t remember, sorry) about a known problem in binned maximum likelihood fits on histograms with sharp peaks. I am unable to find the Indico page for this presentation. I tried googling about this problem to no avail as well. I would like to understand this so can a developer please explain this problem in a little more detail? I am using ROOT v6.18.

I also remember a mention about this problem being solved in ROOT v6.20 release of RooFit?

Thanks for spending some time on this problem!

Cheers,
Da Yu

StephanH · March 31, 2020, 5:00pm

Hi @Da_Yu_Tou,

you might be talking about the LCHb weekly. I don’t have the link, since the agenda is not public.

Anyway, it’s not solved in root 6.20, but it’s on a (long ) todo list.

Cheers,
Stephan

Da_Yu_Tou · April 1, 2020, 10:53am

Hi @StephanH,

I guess it was you . I encountered a small bias in my binned fits that was solved by increasing the number of bins.

Can you please give more details about this problem?
-> The origin of this problem.
-> The name of this problem if it is a widely known stats/math/compute problem. A link to a page about this problem would be appreciated.
-> Known solutions. Does increasing the number of bins solve this? What is the solution you plan to implement in RooFit?

My physics analysis is dealing with this problem. Understanding it would assure us that we are not biasing our results.

Cheers,
Da Yu

StephanH · April 1, 2020, 12:28pm

The origin is that when computing the likelihood of a PDF in a bin, the PDF is evaluated in the bin centre and multiplied with the bin width to get the “total probability” in the bin. If the function is sharply peaked, that can be a bit inaccurate. I’m not aware of a name for such a problem. “Sampling accuracy” maybe?

→ Known solutions. Does increasing the number of bins solve this? What is the solution you plan to implement in RooFit?

Yes, using finer bins reduces that inaccuracy.
A remedy could be to compute multiple probabilities in each bin.

Da_Yu_Tou · April 1, 2020, 1:10pm

Thanks for the clarification @StephanH.

Out of curiosity, I naively assume you are going to implement an interpolation and sample the bins over the interpolated points in RooFit?

Da_Yu_Tou · April 1, 2020, 1:20pm

Also, can you confirm running binned or unbinned fits does not affect the integration? It was one of our suspicion but after careful thought the integration is handled by RooAbsPdf and it’s derived classes (at least to my understanding).

StephanH · April 1, 2020, 1:30pm

Yes, hopefully. I have to check if that can be done without too many hacks.

I’m not aware of problems with integrals in binned fits. For binned PDFs, one has to watch out a bit because default integration strategy for an “unknown” PDF is to run a numeric integral. That’s not really efficient for a binned PDF. Therefore, a special integrator (the RooBinIntegrator) is used, which sums over bins.

moneta · April 2, 2020, 7:54am

For you information there is a JIRA item tracking this issue:

https://sft.its.cern.ch/jira/browse/ROOT-3874

It will be probably be fixed with the on-going a redesign of the negative log-likelihood function in RooFit…
As a workaround , use more bins , or if you can convert your model to a TF1 ROOT object and your data to an histogram, use the ROOT TH1::Fit function.

Lorenzo

system · April 16, 2020, 7:55am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.