Randomly filling an histogram with variable bin length

berder · June 5, 2009, 2:18pm

Hi,

This is not really a question, but rather considerations I came across and think it should be wise to share for the records, to prevent others from the same issues.

I need to fill an histogram with a 1/x distribution, say within the [1e3;1e10] range. Unfortunately, things do not seem to work expectedly.

Here is my function :

TF1* f=new TF1("f","1./x",1e3,1e10);

Now I define three histograms in the same range, with 1000 bins of variable length so that they appear as having the same length in log scale :

const int nbins = 1000;
float scale[nbins+1];
float xmin = 1e3;
float xmax = 1e10;
float step = (log10(xmax)-log10(xmin))/nbins;
scale[0] = xmin;
for (int i=1; i<nbins+1; i++) scale[i] = pow(10,log10(scale[i-1])+step);
TH1D* h1 = new TH1D("h1","h1",nbins,scale);
TH1D* h2 = new TH1D("h2","h2",nbins,scale);
TH1D* h3 = new TH1D("h3","h3",nbins,scale);

Now let’s fill these histograms, first with the FillRandom method, then with GetRandom, and then with GetRandom again after having set the number of points of f to its maximum, that is 100000 :

h1 -> FillRandom("f",1e7);
for (int i=0; i<1e7; i++) h2->Fill(f->GetRandom());
f->SetNpx(100000);
for (int i=0; i<1e7; i++) h3->Fill(f->GetRandom());

The distribution are shown in the first attached plot (randomissue.png). I know this issue is known (see for instance here), but can’t anything be done against that ? It is not necessarily obvious when you fill histograms with a smaller number of bins, but can modify significantly the results…

Moreover, I read “TH1::Fillrandom evaluates the function in the center of the histogram bin, that is less precise than the method in TF1::GetRandom” (here), which suggests that none of these methods is very precise… Is there any workaround ?

Now, let’s say I want to check by comparing with a fixed bin width histogram. Again, two methods :

TH1D* hh1 = new TH1D("hh1","hh1",10000,1e3,1e10);                       
TH1D* hh2 = new TH1D("hh2","hh2",10000,1e3,1e10); 
hh1 -> FillRandom("f",1e7);                         
for (int i=0; i<1e7; i++) hh2->Fill(f->GetRandom());

And again, two different results, see second attachment (randomissue2.png).

Then, I Fill a new histogram (fixed bin size) with the content of the first one (variable bin size). I consider only the one obtained with FillRandow, as “by eyes” it seems the only one usable :

TH1D* hh3 = new TH1D("hh3","hh3",10000,1e3,1e10);
for (int i=0; i<1000; i++) hh3->Fill(h1->GetBinCenter(i+1),h1->GetBinContent(i+1));

According to this new histogram, it seems that the good fixed bin one is the one filled by GetRandom (see third plot, randomissue3.png)…

So it seems that GetRandom should be avoided when filling a variable bin sized histogram, and FillRandom avoided when filling a fixed bin sized histogram. Note that it may seem coherent with the fixed binning of TF1, when you think about it twice.

Any thought ?

brun · June 5, 2009, 2:48pm

A problem in FillRandom was fixed a few weeks ago and also support for variable bin size introduced. I suggest to move to either 5.23/04 or take the trunk.
About your method where you fill fix bin size histograms using 10^4 bins where your dynamic range is 10^7 cannot work, nothing to do with GetRandom.

Rene

berder · June 5, 2009, 3:05pm

Ok, this is using 5.20/00.

I am not sure to understand what you mean. What cannot work ? What would work ?

Thanks.

brun · June 5, 2009, 4:34pm

Use FillRandom from versions 5.23/04 or newer.
Concerning TF1::GetRandom, an important improvement could be obtained by systematically computing the function integral in a log scale point when log(xmax/xmin) >2 or 3 or more precisely when log(xmax/xmin) > log(npx) where npx is the parameter used in TF1::SetNpx.

Rene

brun · June 5, 2009, 8:01pm

In the SVN trunk, I have implemented the log scale binning when computing the function integral. Could you try and let me know.

Rene

berder · June 5, 2009, 8:27pm

Thanks. Unfortunately I don’t have time to install and test it until next weekend. I’ll keep you informed unless someone else test it before.