Idea / Feature proposal : Automatic choice of bin numbers in TH1
Consider the following scenario: You want to histogram a lot of data, without
knowing how many values there are, in wich interval they are or which distribution
they follow. At the moment ROOT offers the posibility to let the axis range to
be choosen automatically. It would be nice to haven an automatic choice of the bin number.
At the moment I’m filling the data into an histogram, then I derive some
properties from the histogram (e.g. the standard deviation ) to calculate the
optimal number of bins by some formular. After that I create a new histogram
with the correct binning and fill the data again. I would be verry happy, if
this could be done by ROOT automatically.
Formulars / Implementation
At the moment I’m using the formular by David W. Scott [Biometrika, Vol. 66,
No. 3 (Dec., 1979), pp. 605-610].
According to Scott the optimal bin size
h for a data set with
x_i is given
h_n = 3.49 * s * n^(-1/3)
s is the estimator of the standard deviation.
Therefore the number of bins
N_B would be :
N_B = (max(x_i) - min(x_i) ) * n^(1/3) / ( 3.49 * s )
But there are also other possibilities to get an estimate for the number of bins.
See Wikipedia arcticle here
Of course there is no optimal number of bins. And usually it will be neccessary
to adjust the bin number by hand as well as to check if the binning is not hiding
some “features” of the data. But to get a first impression and not to see a
barcode, it would be a nice feature.
Sorry I posted it accidentally before checking again. I couldn’t figure out how to format the formulas nicely.