Fitting histogram weighted with low stats "WL" option

wfedorko · February 2, 2012, 8:46pm

Hello,

I am trying to fit a function to a histogram that has been constructed out of events with varying weights. I gather that the ‘correct’ uncertainty on a given bin is computed through a call to Sumw2() before filling. In the tail the histogram we have low statistics so I would like to use some variant of a likelihood fit. To me this seems challenging given that entries in the histogram no longer represent ‘counts’. In the description to the ‘Fit()’ function of TH1F I see there is a “WL” option that seems appropriate but the only documentation is the statement there that:
" “WL” Use Loglikelihood method and bin contents are not integer,
i.e. histogram is weighted (must have Sumw2() set) "
Can anyone explain to me what is actually being done? e.g. How are the empty bins being handled?

I tried to look at the source code but I can’t actually figure out where this line:
root.cern.ch/root/html532/src/TH1.cxx.html#3638
’leads to’. Namespace ROOT::Fit
root.cern.ch/root/html/ROOT__Fit.html
does not seem to contain the function FitObject - how do I find this code?

Thank you,
Wojtek

moneta · February 3, 2012, 2:07pm

Hi,

What is done in this case is to use as PDF for the likelihood fit a “scaled” Poisson distribution. This is not a true distribution, but it is an approximation. In the second case n_eff is not an integer but a real number. The scaled Poisson is just an approximation. If you can, you should describe the full pdf for each bin
taking into account also the model for the weights.
Sine we do not know the weighting model, we can just tale an approximation . In this case one can use
a weighted likelihood as

logL = Sum_i { Scale_i * log [ Poisson(n_eff_i | expected_i) ] }

Where the Poisson is not a real Poisson, because n_eff (number of effective entries per bin) is not an integer.

If the bin uncertainty sigma(i) ^2 = sum of the weight square, and y(i) is the bin content (sum of the weights),
the effective number of entries is y(i)^2/ sigma(i)^2
and the scale factor used for the Poisson in each bin is W(i) = sigma(i)^2/y(i)
For a weighted Poisson, see also the statistic book of G. Bohm and G. Zech, page 61
www-library.desy.de/preparch/boo … p_engl.pdf

The minimum of this likelihood is the same as in the case of a standard Poisson likelihood, so you get the same result as if you would ignore the weights of each event, but the resulting error is different.
A special correction, described in F. James book, paragraph 8.5.2, needs to be applied.
So using this option you will get the right errors from the fit when the histogram is weighted.
Foe the empty bins they are treated as in the unweighted binned likelihood fit. They count in the total number of expected events.

As I said before, this procedure is just an approximate method, but it is found to give in general quite good results, in particular if the weights do not vary very much, and often better than using
just the least square (gaussian pdf for each bin).
For example in the case of applying a global scale factor to the overall histogram (all weights are equal to some value), the result will be correct, and the scale factor will be taken into account in the fit.
Note that in this particular case the number of effective entries is integer.

If you want to see the code, where it is implemented, the actual likelihood calculation is done here:

root.cern.ch/lxr/source/math/mat … l.cxx#1108

Best Regards

Lorenzo