Fit weighted histogram with likelihood

wiso · July 10, 2015, 12:18pm

Hello,

I recently discover this comment in root.cern.ch/root/html/tutorial … ort.C.html

// Please note that error bars shown (Poisson or SumW2) are for visualization only, the are NOT used
  // in a maximum likelihood fit
  //
  // A (binned) ML fit will ALWAYS assume the Poisson error interpretation of data (the mathematical definition 
  // of likelihood does not take any external definition of errors). Data with non-unit weights can only be correctly
  // fitted with a chi^2 fit (see rf602_chi2fit.C)

I was trying to fit an histogram which is the result of the merging of many simulations, which a very strange pattern of error (due to the different statistics of the various region).

I am wondering why RooFit is not able to handle errors different from poisson weight. For example TH1F::Fit can do that, and there is an option called “binned likelihood”.

In addition: this feature is very poor documented, people must know that. It is very common to have errors different from poisson, in particular with simulations.

I attach my result. red = RooAbsPdf::fitTo, green = RooAbsPdf::chi2FitTo, blue: TH1F::Fit

Cheers,
Ruggero

moneta · July 22, 2015, 5:00pm

Hi Ruggero,

A binned likelihood fit (also in TH1::Fit ) assumes a Poisson distribution for the bin. If you know you have a different distributions, you should use that distribution and make the likelihood yourself.
If the distribution is gaussian, then the likelihood is equivalent to the least square sum and has a chi-square distribution.

When combining many different sources it is often correct to assume normal error for the bins. However, when the bins are empty or with very low statistics, the normal approximation is not valid anymore and these bins can bias your result.
An alternative, in case of an histogram filled with weights, is to use an approximate likelihood method. This is implemented in both ROOT (option “LW”) and RooFit (option RooFit::SumW2(true) ).
In this case an approximated Poisson distribution is used which is based on the number of effective entries, defined as (Sum of weights )^2 / (Sum of weight^2 ).

Best Regards

Lorenzo

wiso · July 23, 2015, 9:31am

Thanks, where can I find documentation for RooFit::SumW2. Is it an option for fitTo? I can only see SumW2Error, which only affect the error.

My issue is not the pdf of the number of events in each bin (gaussian or something else). The issue is the value of the error, which is not sqrt(N).

moneta · July 23, 2015, 10:47am

Hi,

Sorry the RooFit option is called RooFit::SumW2Error, and documented here:

root.cern.ch/root/html/RooAbsPd … sPdf:fitTo

The value of the error of the bin depends on the pdf of observing a value N when you expect NExp. This can be Poisson, Gaussian or something else.
In reality the error is never sqrt(N), even if the pdf is Poisson. This is just an estimation using the observed number of events. In case of large statistics, it is correct to approximate the Poisson distribution for the bin content, Poisson(N | Nexp) as a normal distribution Gaus(N | Nexp, sqrt(Nexp) ) ~ Gaus( N | Nexp, sqrt(N) )
since N is large

Best Regards

Lorenzo

wiso · July 23, 2015, 11:45am

[quote=“moneta”]Hi,

Sorry the RooFit option is called RooFit::SumW2Error, and documented here:

root.cern.ch/root/html/RooAbsPd … sPdf:fitTo

[/quote]

Ok, I know that, but it has no effect on the fit, only on the estimation of the errors. My problem is how to take into account the error (the weight) of the input.

moneta · July 23, 2015, 1:57pm

Hi,

As I said before a binned weighted likelihood fit works fine if the bin error can be approximated by a scaled Poisson distributions. The weight is taken into account because is included in the bin content.

Now looking in your plot I see that the red curve (likelihood fit) is totally off compared to the chi-square fit. What did you use in TH1::Fit ? If you do a binned likelihood fit (option “WL”), you are getting the same as the red curve ?

Cheers

Lorenzo

wiso · July 23, 2015, 2:29pm

[quote=“moneta”]Hi,

As I said before a binned weighted likelihood fit works fine if the bin error can be approximated by a scaled Poisson distributions. The weight is taken into account because is included in the bin content.

[/quote]

Yes, a likelihood based on a scaled Poisson distribution is exactly what I need. But from root.cern.ch/root/html/tutorial … ort.C.html I see

So I understand that RooAbsPdf::fitTo is not using the errors of the histogram (my data are in a TH1F with TH1F::Sumw2 set to true converted to a RooDataHist) at all

I guess that the RooAbsPdf::fitTo (red) is completely off because if you don’t use the proper weights (the one coming from MC) the fit is focused in the left region (there are 9 orders of magnitude between the left and right region) since the weight of the events in the right part of the plots make very small contribution to the likelihood.

The RooAbsPdf::chi2FitTo (green) works because the chi2 fit takes into account the original errors and so the weights of the data are more or less uniform and the fit takes into account equally all the events.

TH1F::Fit (blue) works for the same reason.

If I do a binned likelihood with TH1F::Fit I get the same behaviour as TH1F::Fit without any options.

I have also tried to redo all the fits starting from the parameters found by TH1F::Fit, but RooAbsPdf::fitTo prefers another solution. In fact looking at the value of the likelihood the solution found by RooAbsPdf::fitTo is “better” than the one found by TH1F::Fit.

moneta · July 23, 2015, 3:38pm

HI,

It will be better if you post your histogram, and also the workspace, including the RooDataHist, you are using in fitTo. In principle, weighted binned likelihood fit in RooFit or ROOT should give the same results. There is maybe a missing scaling factor that needs to be applied somewhere

Best Regards

Lorenzo

fsili · April 13, 2022, 12:53pm

Dear Lorenzo and Ruggero,

I am having the exact same problem when fitting a distribution. I am trying the chi2FitTo and fitTo methods from RooFit and the latter gives much worse results compared to the chi2FitTo one.

I attach a plot of the result of using the same MC and same function to plot the MC, fitting it with the two methods. Also I leave the code I used to reproduce the problem.

SR_UA2_nll_chi2_fits_sumw2_error_fitrange500-6000.pdf (54.3 KB)
bkg_fits.py (6.0 KB)

What should be the approach to follow here? I also tried the TH1::Fit method with the ‘WL’ options but is not giving me the correct results neither… Do I need to normalize the function I use to fit with?
SR_UA2_nll_fits_sumw2_errors_fitrange500-6000_tf1.pdf (24.2 KB)

Thank you very much,
Francisco

moneta · April 14, 2022, 7:18am

Hi,
I am surprised by the result. It is probably due to the fact that in case of NLL the fit did not converge to the right solution.
I would need to have access also to the input data in order to investigate this further

Best regards

Lorenzo