What distribution does TH1::Sumw2() correspond to?

eggsAndCoffee · May 17, 2020, 11:37pm

I’m trying to understand what the motivation for TH1::Sumw2() is, and under what circumstances it actually properly represents the uncertainty on a histogram bin. From the documentation for TH1::Sumw2(), it is said that:

The error per bin will be computed as sqrt(sum of squares of weight) for each bin.

My question is, why and when does this apply?

I would assume that typical use of a TH1 is to count “things” or “events”, often under a Poisson process. If the weight of each entry were 1, then the error in a bin turns out to be sqrt(bin content), and this makes sense. But if you now have a scenario where the weight of each entry is less than 1, say from Monte Carlo data where the weight represents a statistical weight, then is it appropriate to take the error as sqrt(sum of squares of weight)? I would think that, in the spirit of a Poisson-like process, the error should be sqrt(sum of weights). The calculated uncertainty in the former is much smaller than the calculated uncertainty in the latter generally, so I want to understand the difference.

So why use Sumw2()? What assumptions go into its use? Is it even meant to be used in the circumstance where I’ve described above, where you are counting simulated events with weights < 1 but that would be Poisson otherwise?

moneta · May 18, 2020, 7:15am

Hi,
For the statistics of weighted events, one of the best statistical description is provided by this paper,

Lorenzo

eggsAndCoffee · May 18, 2020, 3:44pm

Thank you Lorenzo. That was a good paper, and it helps my understanding. It’s still not totally clear to me why the uncertainties might be calculated as the sqrt(sum of weight^2) though, and as far as I can tell the paper does not specifically address that does it? Equations 2 and 7 in the paper are pretty similar to the form that I might expect for Sumw2 error calculation, but I can’t make the connection.

Let me get this straight, then. Are you saying that for weighted Poisson events, the error is calcualted as sqrt(sum of weight^2), and that somewhere in the paper you’ve provided this result is proven and that I’ve just missed it?

moneta · May 19, 2020, 9:04am

Hi,

You can use simply the rules of expectation values to come out to the conclusion that the uncertainty can be estimated as sqrt(sum of weight^2).
Equation (7) in the paper shows well this. A bin filled with weighted entries can be seen as described by a compound Poisson distribution, sum of N random variable, where N is Poisson distributed with expected value lambda.
You can find more info also in https://en.wikipedia.org/wiki/Compound_Poisson_distribution where
you have a formula and proof for the variance (and then the uncertainty).
In the case that the expected number of events in the bin is assumed to be equal to the observed one, then E(N) =N (e.g in case of Neyman chi-square) , i.e. lambda ~= N

I hope this helps your understanding

Lorenzo

eggsAndCoffee · May 19, 2020, 11:35am

Ok yes, I get it now! Thank you very much, very helpful.