Please explain the difference between "pulls" and "residuals

Looking at the source it seems that the value and errors of pullHist are normalized to the “high error” (or “low error” depending on whether the value is positive or negative, wherease residHist is not.

Are these “high” and “low” errors just the upper and lower error bars of the bin (usually the Poisson error)?

More importantly, it seems in the pulls the error is normalized to itself. Under standard possion error of the error bars the “hi” and “low” are the same and they are normalized with respect to the “hi” or “low” so you get a normalized value of one. Can someone explain or point to a reference why this would be the case?

Thank you very much.
James

1 Like

you can view the source that I am referring to here:

root.cern.ch/root/html512/src/Ro … 2321938828

Anyone?

Why, statistically, would the error bar of a ‘pull’ histogram have a value of 1. From the code above, the error is normalized to itself.

Thanks
james

Hi James,

The residual r +/- dr is (‘curve’ - histogram) +/- err_histogram, as the curve has no error.

To make the pull the entire expression is divided by the err_histogram. If you assume the
error itself has no error, this would simply scale both the value and the uncertainty of that residual
by that factor.

In case of symmetric errors, the error on the pull indeed ends up as one in that case. I admit I have
not given a great deal of thought when I wrote this code, but it’s not clear to me that this is wrong.

What do you propose the error on the residual should be?

Wouter

Hello Wouter,

Thank you for the reply. I don’t have an alternative. It does seem to make sense, I guess I’m having a hard time understanding why you would normalize the histogram? I guess it would be arbitrary, or in the very least to make the pull histogram a bit smaller?

I was wondering if this was based on some statistical text, but textbooks about these things are much to be desired. You are the only one that seems to make a distinction between ‘pulls’ and ‘residuals’ (with the normalization) and I wish a textbook would do something similar somewhere.

Thanks.

james

Hi,

I think “pull” is just a jargon in the HEP community for a normalized residuals, i.e. a residual which will have an asymptotic Normal(0,1) distribution. In statistics they are also called Studentized residuals.

Lorenzo

Yes, I agree the root distinction is:

pull - normalized
residual - unnormalized

Either way, I’m not aware of an HEP stats text that talks about these things. Seems like knowledge like this is just passed along :wink:.

james

Well according to wikipedia, which seems to have the only non-technical explanation, it just wouldn’t make sense to not Studentize (normalize) the residuals since each bin has a different error. So as far as I can tell this is done to easily compare the residual to the Poisson error in that bin, which is what one cares about the most when looking for a graph of pulls. I suppose this makes sense, and more and more obvious when I think about it. If anyone else has anything else to add, please let me know. (I am still curious why anyone would ever use residHist() since pullHist is all you ever really need, or is it?)