MC statistical error in histfactory

Dear experts,

Regarding the statistical error for mc samples, say in histfactory, what is the correct implementation?
My understanding is that the statistical error should reflect the number of (unweighted) simulated events, but the input histogram is weighted, and has Sumw2() activated.
In the histfactory ActivateStatError() method, you can pass a separate histogram for the statistical error. Is this the correct implementation?

Cheers, Mattias

1 Like

Hi Mattias,
section 2.2.1in the HistFactory documentation https://cds.cern.ch/record/1456844/ should describe what you need.

If you have a Histfactory sample simply calling ActivateStatError() will add a series of parameters (alphas) to your model according to the Barlow Beeston-Lite implementation.
~/Vince

1 Like

Hi Vince,

Thanks for your answer and the link. But I’m still unsure about the meaning of the variables in the Poisson constraint term (in the description you linked to), and where histfactory looks for their values:

I understand that m_b is the number of (unweighted) simulated events, and that tau_b corresponds to 1/(relative statistical uncertainty)^2, therefore we expect the two to be the same (for Poisson statistics).

But these are not inputs to the fit, rather they are estimated from the number of weighted events (nu_b^MC) and the total statistical uncertainty (delta_b). What does this delta_b correspond to, and from which histogram are its values taken from?

In the description, they give the relation m_b = ( delta_b / nu_b^MC )^2 but I’m confused by this.
Maybe you have some further insight?

Thanks,
Mattias

Hello Mattias, Vince,

The nuisance parameters added by HistFactory are actually called gamma. The gamma scale up/down the number of events expected in each bin.

HistFactory uses TH1::GetBinError to obtain the statistical error. If you have a weighted sample, this comes from SumW^2, and not from the bare SumW. Therefore, it represents the statistical power of your Monte Carlo simulations accurately.

Delta, the relative error, tau and m are used only to make implementing the constraint easier. Instead of asking yourself what the correct parameters for a Poisson term with a non-integer number of observed events would be, you simply pretend to have an unweighted sample of size m, where you choose m such that the sample has the same statistical accuracy as the weighted sample.

Cheers,
Stephan

2 Likes

Hello Stephan,

Thanks for the useful details, I was wrongly thinking that the statistical error and the sumw2 were different, it makes more sense now.

For reference, I found the details here useful as well: https://www-zeuthen.desy.de/~wischnew/amanda/discussion/wgterror/working.html

I think the relation I quote in my last reply, m_b = ( delta_b / nu_b^MC )^2 , needs to be inverted to be correct.

Thanks,
Mattias

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.