Given a distribution, I want to find the smallest interval containing 68% of events.
I start with creating a Cumulative distribution from the existing TH1.
Each bin in h_cumulative is a sum of the number of entries in that bin and all the previous bins. I was thinking that by looping over all bins, I can find the difference between bin contents of consecutive bins until the difference is close to 0.68. But I think there is some fallacy in this logic. Could some one please tell me how I can proceed to solve this problem?
Hi,
the problem of your method is that you you took the difference between two consecutive bins
If you want to have a precise estimation of 68% CL, you need a very large number of bins, but in this way this difference between two consecutive bins will approach to zero.
For what you need the thing is more complex, for each bin i of the cumulative histogram you have to compute the difference between the bin i and all the next bins, starting from the bin 0. In this way you will find, for each bin of the cumulative distribution, which is the index jassociate to iso that the the area between bin_ i and bin j is the 68%.
Then you should find the minimum between all the differences and this is what you want.
I hope you understand it.
Cheers,
Stefano
PS
Maybe if you want to boost the program, instead of a simple loop, is better to start the search where the steepness of the cumulative distribution is larger.
Yes the idea is this, everything seems more or less correct to me.
Pay attention to value of i close n_bins there every interval is less than 0.68, and the risk is to take an interval with a probability way smaller than 0.68 and a very small interval.