PyROOT ratio of 2-D histograms code implementing significance

ssussman · April 19, 2016, 12:48am

I am interested in finding the significance of the difference between entries in bins between two plots

“X / uncertainty on X”

where X is the difference on the number of entries. In ROOT, I think that I should make something like this

(BinContent_a - BinContent_b) / sqrt((BinError_aBinError_a + BinError_bBinError_b))

and then use SetBinContent on the comparison histogram to overwrite its contents with the significance, instead of the ratio.

I am looking for a way to implement this in my own code which makes a ratio of 2-D histograms. Here is the code in PyROOT:

import ROOT
ROOT.gROOT.SetBatch()

#open root file
rootfile1 = ROOT.TFile.Open(“HIST.07511504._000001.pool.root.1”)

#get histo 1
etaphi_284285 = rootfile1.Get(“run_284285/MuonPhysics/Muons/CBMuons/Muons_CBMuons_eta_phi”)

#open second root file
rootfile2 = ROOT.TFile.Open(“HIST.07363316._000001.pool.root.1”)

#get histo 2
etaphi_283429 = rootfile2.Get(“run_283429/MuonPhysics/Muons/CBMuons/Muons_CBMuons_eta_phi”)

#n bins merge into one bin in both the x and the y directions
rebin_eta = 4
rebin_phi = 4

#begin loop over both histograms
for histogram in [etaphi_284285,
etaphi_283429,
]:

print “histogram=”, histogram
#rebin
histogram.Rebin2D(rebin_eta, rebin_phi)
print “histogram.Integral=”, histogram.Integral()

#normalize to get ratio of the events in bin/total events in each histogram
histogram.Scale(1 / histogram.Integral())

#divide the histograms
ratio = etaphi_284285.Clone()
ratio.Reset()
ratio.SetName(“ratio_284285_283429”)
ratio.Divide(etaphi_284285, etaphi_283429)

(then I create a root file containing this histogram).

Danilo · April 19, 2016, 5:21am

Hi,

I am not sure I fully get your question. Could you elaborate?

D

ssussman · April 19, 2016, 10:44am

Hi,

My code currently creates a new “ratio histogram” that is the ratio of two 2-D histograms that it gets. (It creates a ratio of the spatial distribution of muons in a particle detector between two runs.) The ultimate goal is to be able to get one “good” run and then take its ratio with various new runs, to see if there are any data quality problems.

Some of the bins in these 2D histograms only have 2 entries in them, while other bins have hundreds. But the code currently sees the ratio between 2 vs. 1 entries identically to the ratio between 400 vs. 200 entries, even though the first case is not statistically significant.

I want to incorporate the “significance” of the difference in bin entries between runs so the code overwrites the “ratio histogram”'s contents with the significance, instead of the ratio.

The few ideas I was thinking about so far I listed above.

Does this help?

Danilo · April 19, 2016, 11:04am

I am not sure I get it still. Why are not you considering the uncertainty on this ratio?

ssussman · April 19, 2016, 12:23pm

Put another way- do you have ideas on how I could weigh the bins on this “ratio histogram” based on how many entries were in the bins of the original two histograms which were divided?

In other words, implement a mechanism so the ratio of 2:1 entries in a bin shows up as being less significant than the ratio of 400:200 entries in a bin?

ssussman · April 19, 2016, 12:25pm

I am not sure what you mean by “uncertainty on the ratio”

Danilo · April 19, 2016, 4:04pm

Hi,

if the quantity at numerator and denominator come from some experimental measurement (or simulation thereof) they are affected by a statistical uncertainty. Their ratio will be too.

Cheers,
D