Dear all!

I’ve already asked on ATLAS’s statistics mailing list, but unfortunately didn’t received a reply. I’m aware of the assumptions (http://root.cern.ch/root/html/TFractionFitter.html):

[ul]

(1) The total number of events in each template is not too small

(so that its Poisson uncertainty can be neglected).

(2) The number of events in each bin is much smaller than the total

number of events in each template (so that multinomial

uncertainties can be replaced with Poisson uncertainties).

[/ul]

But anyhow, I cannot produce “perfect” errros with my toy MC. Could you please have a look?

I want to do a template fit to estimate the QCD background contribution

to a Z->ee selection. I have a data histogram (data), a signal Monte

Carlo histogram (mc0) and a QCD template histogram (mc1).

For the fit I use ROOT’s TFractionFitter. I now about the issue with

weighted histograms, but my problem is more basic, so let’s forget about

weights for the moment.

The problem I have is the statistical interpretation of the error on the

fraction that TFractionFitter returns. I have the impression that it is

always too high. As demonstration I put together a toy MC:

`/afs/cern.ch/user/s/schmitts/public/test_TFractionFitter/test_TFractionFitter.cxx`

The toy data is the sum of two Gaussian with different mean values.

The fit is done again and again and for each iteration all three

histograms (data, mc0, mc1) are filled from scratch thus are

statistically independent.

You can run the program like this:

```
g++ -g test_TFractionFitter.cxx `root-config --libs --ldflags --cflags`
-o test_TFractionFitter
./test_TFractionFitter 1000 100000 10000 10000 10000 0
number of fits: 1000
number of entries in data histogram: 110000
contribution to data from mc0 distribution: 100000
contribution to data from mc1 distribution: 10000
corresponding fraction of mc1 distribution in data: 0.0909091
number of entries in mc0 histogram: 10000
number of entries in mc1 histogram: 10000
verbose mode: false
...
```

At the end a canvas should pop up that shows on the top the pull

distribution, defined as “true fraction - fitted fraction”/“error on the

fraction”. The lower pad shows data as black dots and mc0, mc1 stacked.

The pull shows no bias, the mean is zero, but the width is not 1 but

~0.7. I interpret this as a too large error on the fraction.

If you play with the parameters you’ll find that width of the pull

distribution is always smaller than 1. It is smaller the “easier” it is

for the fit, e.g. more entries in the histograms.

To summarise my problem. Should I expect a width of 1 for the pull

distribution? If not, what is the correct statistical interpretation of

the error TFractionFitter returns?

Thanks in advance,

Sebastian Schmitt