Dear TEfficiency developers,
I commend your efforts to provide standard tools to one of the
most common problems in HEP - how to calculate efficiencies -
which is often treated wrong. I expect your class to quickly
become a de-facto standard. Thus it’s worth while to carefully
consider how it will be used, and review its appropriateness.
I’ll inform Bob Cousins of this thread, as I quote him below, and
I hope he’ll find the time to review this important standard tool.
Unfortunately, I fear your class design and documentation will
lead to mistaken solutions. Especially as some users trust
official ROOT solutions as gospel.
The main issue as that through out you assume that the total number
of events is predetermined. The same assumption underlies
TGraphAsymmErrors’ default constructor. In my experience,
this happens very rarely. In fact, yesterday was the first time I saw
this happen in an analysis in the last several years
When event must pass selection criteria before entering the “total”
histogram, this is really a tri-nomial problem: events are either
ignored, passed, or failed. It is easy to show that given a probability
to be ignored I, the correlation between the number of failing and
passing events is -1+I.
In particular, in an efficiency measured in hadron collider data, the
trigger system provides a high rejection rate, that is, an I of almost 1,
making the numbers of passing and failing events uncorrelated to an
extremely accurate approximation.
A similar situation usually happens in the corresponding MC-efficiency
measurements, though exception can happen.
Assuming as you do, that the total number of events is predetermined is
equivalent to assuming I=0, and hence total anti-correlation and larger
uncertainties.
The proper solution is simply to use independent random variables for
the passing and failing counts, and do standard error propagation. If
no weights are used these are Poisson, and the result is the “usual”
but IMHO misnamed “Binomial formula”. One can solve precisely the case
of intermediate values of I, but practically all cases have I > 0.999
or I=0, so I never bothered studying that further.
A secondary issue is the choice of Clopper-Pearson error bars as the default.
Theoretically, this is a reasonable default choice (see e.g., arXiv:0905.3831)
for quoting final results. However my experience is that effieciencies
are almost always immediately propagated into further steps in the
analysis that assume Gaussian uncertainties with the quoted width,
usually a fitter to parametrize the efficiency in some parameter.
But as you know and documented, Clopper-Pearson systematiclly over-covers,
so the widths fed to the fitter as biased high. You go one step better
than most by using Binomial uncertainties, but suffer the same bias.
From your documentation, the Wilson intervals, and the Bayesian intervals
you implemented, are all free from this problem. I strongly suggest you
change the default to one of these.
In April we had a similar discussion for specific trigger efficiencies
in CMS, and Bob Cousins concluded with:
if you really have this constraint
(not sure why), then the generalized Agresti-Coull interval (which uses
the midpoint of the Wilson interval as the “measured value”) would be
the thing I would try. It is designed for the Gaussian approximation,
and as explained in our paper, the Wilson interval is the correct way to
construct a binomial confidence interval in the Gaussian approximation.
The “constraint” in question was that “the uncertainty enter a fitter that
uses it to determine a Gaussian PDF”, but as you see Bob also covered your
case, where the fitted uses the proper binomial C.I.s
Cheers,
Amnon Harel
University of Rochester