# Idea / Feature proposal : Automatic choice of bin numbers in TH1

## Introduction

Consider the following scenario: You want to histogram a lot of data, without

knowing how many values there are, in wich interval they are or which distribution

they follow. At the moment ROOT offers the posibility to let the axis range to

be choosen automatically. It would be nice to haven an automatic choice of the bin number.

At the moment I’m filling the data into an histogram, then I derive some

properties from the histogram (e.g. the standard deviation ) to calculate the

optimal number of bins by some formular. After that I create a new histogram

with the correct binning and fill the data again. I would be verry happy, if

this could be done by ROOT automatically.

## Formulars / Implementation

At the moment I’m using the formular by David W. Scott [Biometrika, Vol. 66,

No. 3 (Dec., 1979), pp. 605-610].

According to Scott the optimal bin size `h`

for a data set with `n`

values `x_i`

is given

by:

`h_n = 3.49 * s * n^(-1/3)`

Here `s`

is the estimator of the standard deviation.

Therefore the number of bins `N_B`

would be :

`N_B = (max(x_i) - min(x_i) ) * n^(1/3) / ( 3.49 * s )`

But there are also other possibilities to get an estimate for the number of bins.

See Wikipedia arcticle here

Of course there is no optimal number of bins. And usually it will be neccessary

to adjust the bin number by hand as well as to check if the binning is not hiding

some “features” of the data. But to get a first impression and not to see a

barcode, it would be a nice feature.

Edit:

Sorry I posted it accidentally before checking again. I couldn’t figure out how to format the formulas nicely.