Calculating p-values?

KAM · August 6, 2020, 6:48am

I have a two sets of data: known event times, and event times recorded by an instrument. I want to determine whether the instrument is measuring these events. I’ve paired the closest elements from each dataset, and plotted the difference between each pair on a histogram. There’s a peak at 0. Now, I’ve fitted a Gaussian curve to the histogram. I now want to determine the likelihood that this is a “random” occurrence. I believe what I want to do is calculate the p-value or “sigma threshold”. I understand ROOT has facilities to do this, however I’m new to ROOT and self-taught in statistics, so I’m not exactly sure what to do here. How would I go about calculating the p-value and/or “sigma threshold”?

StephanH · August 6, 2020, 7:44am

When you are looking for p-values, the first thing you need is two models:

The null model: What would the outcome of the experiment look like if there’s no effect.
The signal model: What would it look like if there is some effect?

I don’t fully understand your setup, but I suppose that the null model in your case could be that the integral of the Gaussian is zero, i.e. that there’s no peak at zero, and the signal model is that there’s a peak. If that’s the case, just use the parameter that corresponds to the integral of the Gaussian.

If you are OK with crude poor-man’s statistics, estimate the significance of that by dividing the integral by its uncertainty. 15 ± 5 is three sigma away from zero, so your measured outcome would exclude the “there’s no peak” hypothesis with three sigma. If you want a p-value, look up the quantiles of the gaussian distribution that correspond to three sigma:
https://root.cern/doc/master/namespaceROOT_1_1Math.html#a31473bbf4531f10b3f4bf54dfeb6d450
That’s 0.0013 in this case. (We just assume without checking that the measurement of this integral parameter is normally distributed when looking this up using a Gaussian. That’s probably wrong, but hopefully accurate enough to get a good approximation of the “true” p-value.)

If you need more elaborate statistics (i.e. you want more accuracy for the p-value), I would advise to perform a hypothesis test using the HypoTestCalculator from RooStats. That requires writing down your fit model in RooFit, though.
https://root.cern.ch/doc/master/StandardHypoTestDemo_8C.html

system · August 20, 2020, 7:44am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.