I am trying to generate data from a lognormal distribution. Please see attached working example.
In the example, I make a TF1, fill a histogram with generated data points, fit it, and compare the fitted TF1 to the original TF1. They are very different, far more different than can be naively explained by statistical fluctuations. If I use linear rather than logarithmic binning on the x axis I see the effect is smaller and of opposite sign but still present. Changing a lognormal distribution to a normal distribution makes the effect go away. Is this behaviour expected? Did I do something silly? I am using ROOT version 5.34/14.
Any help that could make my generated data points follow the expected distribution is much appreciated.
lognormal.C (1.33 KB)
First of all I think you have not correctly defined the log-normal distribution. Do not implement your self it using TFormula, but use the function provided by the library, ROOT::Math::lognormal_pdf
Just create the TF1 by doing (using your parameterisations):
TF1 * f = new TF1(“f”,"*ROOT::Math::lognormal_pdf(x, log(), log() )",0,1000);
Then the problem is the not uniform bins of the histogram. The histograms collect the number of counts, now if you want to represent a probability density function you need to divide the count by the bin width.
So before plotting the histogram you should divide the bin content by the bin width. You can do this easily by calling TH1::Scale(1, “width”)
I attach the corrected macro using more entries for filling the histogram and better binning
lognormal.C (1.71 KB)
One more thing. You will have also a bias by doing the chi2 fit, which will be more relevant when the statistics is not very high, as in your case. A likelihood fit will be much better and you should use in your case.
Since you have re-scaled the histogram, you should use the fit option “WL”
Thanks very much for your help! The key thing I missed was indeed TH1::Scale(1, “width”) and the “WL” option of the fit, as the first caused a large deviation with my logarithmic bins, and the second caused a bias on the chi2 fit with uniform bins.
There are several variants of the lognormal distribution and I agree the one I am using is not standard.I should have called it pseudo-lognormal or something, sorry…