Fast generation of random numbers from PDF defined with non-uniformly distributed data

jwodin · August 31, 2016, 4:35pm

What is the fastest way to generate random numbers from a PDF defined with non-uniformly distributed data. In my case, my PDF is discrete and non-uniformly distributed on the x-axis. This for a very high statistics Monte Carlo, so this needs to be as fast as possible. For example, in pseudo-code, my non-normalized PDF may look like:

x_pdf = {1.0, 2.0, 4.0, 7.0, 8.0, 9.4, 12.2};
y_pdf = {3.1, 4.5, 6.0, 2.1, 5.0, 1.0, 1.0};

and I’d like to grab continuous random numbers between x_low = 1 and x_high = 12.2.

Clearly I need to interpolate somehow. One option would be to create a TF1 from fitting the data, but that may be hard in some cases. Is my best option to create a user-defined TF1 with these discrete values, and return some kind of interpolation between the x-values as needed?

I will be running this MC on a large cluster with very high statistics requirements, hence the question about efficiency.

Thanks

Jesse

moneta · October 4, 2016, 12:39pm

Hi,

One possibility is to estimate the pdf using a kernel density estimator and then generate the random numbers from this.
The Unuran package, which is available in ROOT, (when configuring with -Dunuran=On) has a method for doing this.

If instead you have some (x,y) points representing a functions, you can maybe use some interpolation method to define a function to sample.
First I would build a TGraph class from the points, use interpolation to create a TF1 from the TGraph::Eval method and then generate the random numbers from TF1.

Best Regards

Lorenzo