Calculating interval with x% of events

Tanvi_Wamorkar · October 1, 2019, 3:28pm

Dear experts,

Given a distribution, I want to find the smallest interval containing 68% of events.
I start with creating a Cumulative distribution from the existing TH1.

inFile = 'newCatalog_30Sep2019/hadd_tree/signal_m_60.root'
chain = ROOT.TChain('h4gCandidateDumper/trees/SUSYGluGluToHToAA_AToGG_M_60_TuneCUETP8M1_13TeV_pythia8_13TeV_4photons')
chain.Add(inFile)

h = ROOT.TH1F("h",";Avg. Diphoton Mass[GeV];# of events",100, 0,100 )
chain.Draw('avg_dp_mass >> h')
h_cumulative = h.GetCumulative()

Each bin in h_cumulative is a sum of the number of entries in that bin and all the previous bins. I was thinking that by looping over all bins, I can find the difference between bin contents of consecutive bins until the difference is close to 0.68. But I think there is some fallacy in this logic. Could some one please tell me how I can proceed to solve this problem?

Thanks in advance,
Tanvi

Dilicus · October 1, 2019, 3:57pm

Hi,
the problem of your method is that you you took the difference between two consecutive bins
If you want to have a precise estimation of 68% CL, you need a very large number of bins, but in this way this difference between two consecutive bins will approach to zero.

For what you need the thing is more complex, for each bin i of the cumulative histogram you have to compute the difference between the bin i and all the next bins, starting from the bin 0. In this way you will find, for each bin of the cumulative distribution, which is the index jassociate to iso that the the area between bin_ i and bin j is the 68%.
Then you should find the minimum between all the differences and this is what you want.

I hope you understand it.

Cheers,
Stefano

PS
Maybe if you want to boost the program, instead of a simple loop, is better to start the search where the steepness of the cumulative distribution is larger.

Tanvi_Wamorkar · October 1, 2019, 4:27pm

Hi Stefano,

Thanks for the suggestion.
Do you mean implementing something like this:

inFile = '/newCatalog_30Sep2019/hadd_tree/signal_m_60.root'
chain = ROOT.TChain('h4gCandidateDumper/trees/SUSYGluGluToHToAA_AToGG_M_60_TuneCUETP8M1_13TeV_pythia8_13TeV_4photons')
chain.Add(inFile)

h = ROOT.TH1F("h",";Avg. Diphoton Mass[GeV];# of events",100, 0,100 )
chain.Draw('avg_dp_mass >> h')

h_cumulative = h.GetCumulative()

n_bins = h_cumulative.GetNbinsX()
diff_values = []
for i in range(0,n_bins):
    for j in range(i,n_bins):
        diff =  h_cumulative.GetBinContent(i)-h_cumulative.GetBinContent(j)
        if (abs(diff) < 0.68):
            diff_values.append(abs(diff))

and then finding the min of diff_values and finding the corresponding x-values?

Dilicus · October 1, 2019, 5:22pm

Yes the idea is this, everything seems more or less correct to me.
Pay attention to value of i close n_bins there every interval is less than 0.68, and the risk is to take an interval with a probability way smaller than 0.68 and a very small interval.

S

system · October 15, 2019, 5:22pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.