The new TEfficiency class

aharel · September 30, 2010, 12:08pm

Dear TEfficiency developers,

I commend your efforts to provide standard tools to one of the
most common problems in HEP - how to calculate efficiencies -
which is often treated wrong. I expect your class to quickly
become a de-facto standard. Thus it’s worth while to carefully
consider how it will be used, and review its appropriateness.
I’ll inform Bob Cousins of this thread, as I quote him below, and
I hope he’ll find the time to review this important standard tool.

Unfortunately, I fear your class design and documentation will
lead to mistaken solutions. Especially as some users trust
official ROOT solutions as gospel.

The main issue as that through out you assume that the total number
of events is predetermined. The same assumption underlies
TGraphAsymmErrors’ default constructor. In my experience,
this happens very rarely. In fact, yesterday was the first time I saw
this happen in an analysis in the last several years

When event must pass selection criteria before entering the “total”
histogram, this is really a tri-nomial problem: events are either
ignored, passed, or failed. It is easy to show that given a probability
to be ignored I, the correlation between the number of failing and
passing events is -1+I.

In particular, in an efficiency measured in hadron collider data, the
trigger system provides a high rejection rate, that is, an I of almost 1,
making the numbers of passing and failing events uncorrelated to an
extremely accurate approximation.
A similar situation usually happens in the corresponding MC-efficiency
measurements, though exception can happen.
Assuming as you do, that the total number of events is predetermined is
equivalent to assuming I=0, and hence total anti-correlation and larger
uncertainties.

The proper solution is simply to use independent random variables for
the passing and failing counts, and do standard error propagation. If
no weights are used these are Poisson, and the result is the “usual”
but IMHO misnamed “Binomial formula”. One can solve precisely the case
of intermediate values of I, but practically all cases have I > 0.999
or I=0, so I never bothered studying that further.

A secondary issue is the choice of Clopper-Pearson error bars as the default.
Theoretically, this is a reasonable default choice (see e.g., arXiv:0905.3831)
for quoting final results. However my experience is that effieciencies
are almost always immediately propagated into further steps in the
analysis that assume Gaussian uncertainties with the quoted width,
usually a fitter to parametrize the efficiency in some parameter.
But as you know and documented, Clopper-Pearson systematiclly over-covers,
so the widths fed to the fitter as biased high. You go one step better
than most by using Binomial uncertainties, but suffer the same bias.
From your documentation, the Wilson intervals, and the Bayesian intervals
you implemented, are all free from this problem. I strongly suggest you
change the default to one of these.

In April we had a similar discussion for specific trigger efficiencies
in CMS, and Bob Cousins concluded with:

if you really have this constraint
(not sure why), then the generalized Agresti-Coull interval (which uses
the midpoint of the Wilson interval as the “measured value”) would be
the thing I would try. It is designed for the Gaussian approximation,
and as explained in our paper, the Wilson interval is the correct way to
construct a binomial confidence interval in the Gaussian approximation.

The “constraint” in question was that “the uncertainty enter a fitter that
uses it to determine a Gaussian PDF”, but as you see Bob also covered your
case, where the fitted uses the proper binomial C.I.s

Cheers,
Amnon Harel
University of Rochester

moneta · October 1, 2010, 1:29pm

Dear Amnon,

Thank you for your useful comments for the new class. It is good you are reviewing and also statistical experts like Bob. If you find also mistaken in the documentation please let me know.

Now, concerning your remarks, it is true that in the class we assume that the number of total events ntot is pre-determined (binomial case).
As well described in Bob’s paper ( Nuclear Instruments and Methods in Physics Research A 612 (2010) 388–398 and cited by you) the problem of ntot free is the ratio of Poisson problem. As we obtain in the binomial case confidence interval for rho (or epsilon) we can obtain confidence interval for lambda (the ratio of the Poisson means) by considering
the relation lambda = 1/rho -1.

So, if I have understood well, would you like to have intervals for lambda ?
In this case we could modify the class to give as well confidence intervals in lambda using the same statistical methods.

What having as default method for the error bars is certainly debatable. I will leave this to the statistical experts like Bob to make the choice. I have put it because theoretically is preferable and recommended in both Bob paper and the PDG.
I agree with you that the efficiency should not be fitted using a weighted least square method with errors defined by the Clopper-Pearson 68% interval.
The correct way is by performing a maximum likelihood fit and this is implemented in TEfficiency::Fit or eventually a full Bayesian analysis.
We could maybe, if you want, implement an option in TEfficiency::Fit to use the least square method with Wilson or Agresti-Coull errors

Thanks again for your comments.

Lorenzo

aharel · October 1, 2010, 2:39pm

Dear Lorenzo,

You’re right, this should be looked at as a ratio of Poisson variables problem conditioned by the observed total number of events (pass + fail). That reduces to the Binomial which is what you did, and is perfectly correct. Me bad.

As for the 2nd issue, I guess it’s just my misunderstanding. For some reason I thought the quoted errors are used in the fit. This is why I was worried about the default choice of errors. Had I followed the link to TBinomialEfficiencyFitter I would’ve had a chance to see that in each bin you use the full binomial distribution, and not the Gaussian approximation. So that this is independent of the choice of errors. I guess it’s worth spelling that out root.cern.ch/root/htmldoc/TEfficiency.html#fit

Cheers,
Amnon