Idea / Feature proposal : Automatic choice of bin numbers in TH1
Introduction
Consider the following scenario: You want to histogram a lot of data, without
knowing how many values there are, in wich interval they are or which distribution
they follow. At the moment ROOT offers the posibility to let the axis range to
be choosen automatically. It would be nice to haven an automatic choice of the bin number.
At the moment I’m filling the data into an histogram, then I derive some
properties from the histogram (e.g. the standard deviation ) to calculate the
optimal number of bins by some formular. After that I create a new histogram
with the correct binning and fill the data again. I would be verry happy, if
this could be done by ROOT automatically.
Formulars / Implementation
At the moment I’m using the formular by David W. Scott [Biometrika, Vol. 66,
No. 3 (Dec., 1979), pp. 605-610].
According to Scott the optimal bin size h
for a data set with n
values x_i
is given
by:
h_n = 3.49 * s * n^(-1/3)
Here s
is the estimator of the standard deviation.
Therefore the number of bins N_B
would be :
N_B = (max(x_i) - min(x_i) ) * n^(1/3) / ( 3.49 * s )
But there are also other possibilities to get an estimate for the number of bins.
See Wikipedia arcticle here
Of course there is no optimal number of bins. And usually it will be neccessary
to adjust the bin number by hand as well as to check if the binning is not hiding
some “features” of the data. But to get a first impression and not to see a
barcode, it would be a nice feature.
Edit:
Sorry I posted it accidentally before checking again. I couldn’t figure out how to format the formulas nicely.