Interpretation of error on fraction from TFractionFitter

SebastianSchmitt · July 11, 2012, 9:42am

Dear all!

I’ve already asked on ATLAS’s statistics mailing list, but unfortunately didn’t received a reply. I’m aware of the assumptions (http://root.cern.ch/root/html/TFractionFitter.html):

[ul]
(1) The total number of events in each template is not too small
(so that its Poisson uncertainty can be neglected).

(2) The number of events in each bin is much smaller than the total
number of events in each template (so that multinomial
uncertainties can be replaced with Poisson uncertainties).
[/ul]
But anyhow, I cannot produce “perfect” errros with my toy MC. Could you please have a look?

I want to do a template fit to estimate the QCD background contribution
to a Z->ee selection. I have a data histogram (data), a signal Monte
Carlo histogram (mc0) and a QCD template histogram (mc1).

For the fit I use ROOT’s TFractionFitter. I now about the issue with
weighted histograms, but my problem is more basic, so let’s forget about
weights for the moment.

The problem I have is the statistical interpretation of the error on the
fraction that TFractionFitter returns. I have the impression that it is
always too high. As demonstration I put together a toy MC:

/afs/cern.ch/user/s/schmitts/public/test_TFractionFitter/test_TFractionFitter.cxx

The toy data is the sum of two Gaussian with different mean values.

The fit is done again and again and for each iteration all three
histograms (data, mc0, mc1) are filled from scratch thus are
statistically independent.

You can run the program like this:

g++ -g test_TFractionFitter.cxx `root-config --libs --ldflags --cflags`
-o test_TFractionFitter

./test_TFractionFitter 1000 100000 10000 10000 10000 0
number of fits: 1000
number of entries in data histogram: 110000
contribution to data from mc0 distribution: 100000
contribution to data from mc1 distribution: 10000
corresponding fraction of mc1 distribution in data: 0.0909091
number of entries in mc0 histogram: 10000
number of entries in mc1 histogram: 10000
verbose mode: false
...

At the end a canvas should pop up that shows on the top the pull
distribution, defined as “true fraction - fitted fraction”/“error on the
fraction”. The lower pad shows data as black dots and mc0, mc1 stacked.

The pull shows no bias, the mean is zero, but the width is not 1 but
~0.7. I interpret this as a too large error on the fraction.

If you play with the parameters you’ll find that width of the pull
distribution is always smaller than 1. It is smaller the “easier” it is
for the fit, e.g. more entries in the histograms.

To summarise my problem. Should I expect a width of 1 for the pull
distribution? If not, what is the correct statistical interpretation of
the error TFractionFitter returns?

Thanks in advance,

Sebastian Schmitt

delo · August 2, 2012, 9:23pm

Hi Sebastian,
I am not sure where the problem can be but I heard people arguing that when you have weighted MC events and
the statistics of the two templates is different one should scale down stats of the higher stat sample to the sample with lower stats. Otherwise you will underestimate of the uncertainty.
I was not aware of this and I didn’t receive any solid/mathematical argument supporting this statement.
Maybe some expert can comment on this.

cheers,
delo