Home | News | Documentation | Download

In few cases RooAbsPdf::generate() fails to create correct distributions

In few cases RooAbsPdf::generate() fails to create correct distributions

Dear Rooters,

I have a model with 3 variables

  • variable mass (Gaussian)
  • variable Vgg (Gaussian)
  • variable Vexex (exponential)

and a pdf sketched as

  • Nsig*Gauss(mass)*Gauss(Vgg)*Exponential(Vexex)

I am generating data sets using RooAbsPdf::generate() with the Extended flag set. I do this ~1e5 times and check the generation by calculating the mean value of mass and Vexex (I forgot to include vgg mean, but I can add this on demand). I get the following plot with logarithmic y axis, showing some outliers for the mean vaule:


A similar plot is obtained for the mean values of Vexex distributions:

A correlation plot of Vexex and mass mean values shows, that the outliers of both are not correlated:

Looking into one data set with a mean among the mass outliers, one can see the following correlation plot of generated mass and vgg events:

(Who stole the piece of the pie? :wink: )

  • Instead of TRandom3 which is the default in RooFit I checked it with TRandomMixMax. The same problem was found.
  • Using a smaller value (0.0125 instead of 0.1) for the lamda of the signal exponential in vexex vexexSigLam does not cause the problem. So the problem is somehow sensitive to this parameter.
  • Am I’m doing something wrong or running into a known problem?
  • As a workaround I will simply generate the data myself, but somehow it would be convenient to use the generate method.

On the Example

  • Basically the same as I used in previous posts
  • The Configurator class holds the Parameter values. The model is then build via the ModelBuilder class. The main macro is checkRooAbsPdfGenerate.C
    checkRooAbsPdfGenerate.C (3.2 KB) Config.C (3.9 KB) modelBuilder.C (13.8 KB)
  • If an outlier in mean of mass is found the RooDataSet is saved

How to reproduce

  • Either run the macro checkRooAbsPdfGenerate.C on a batch system with ~10^5 pseudo experiments
  • Or check running root -q "checkRooAbsPdfGenerate.C(1,kTRUE,\"test.root\", 109238706, kFALSE)" this I used to produce the pie plot above
  • If you want to look at a strange data set, you can find the data set to generate the pie plot above here: test.root (443.1 KB)

ROOT version / platform

  • Tested with ROOT 6.20.06 (gcc-4.9) and 6.20.04 (gcc-4.9) on Debian 8 (jessie)
  • Ubuntu 18.04 ROOT 6.20.04 (gcc-7.5)

I guess @moneta and/or @StephanH can help you.

Hmm, this is tricky. I could imagine that TFoam who is supposed to sample the function gets thrown off somehow.
@moneta, have you seen this before?

Hi,
Yes I think I have seen that before, and if I remember it was a problem with the number of cells (nCell3D) or number of sample (nSample) of the Foam generator.
Have you tried to increase the number of cells (nCell3D) or number of samples (nSample) for the Foram generator ?

See A bug ?, extract a 2-D histogram based on Roofit

Lorenzo

Hi Lorenzo,

thanks a lot for your comment. I will try nSample=1e4 as you suggested in the post cited. After that I will try nCell3d=2e4, or would you suggest another value?

When I have the reults I will report here again.

Cheers
Tim

Hi,

I think those values should be fine. If you see still the effect you can try increase them further. A drawback of increasing them too much is that the generation time will become slower.
If you will obtain still a wrong result, we will investigate further. An alternative could be to try using the UNURAN package for sampling the distribution

Cheers

Lorenzo

Hi Lorenzo,

using nSample=1e4 solved the problem. Thanks a lot for that suggestion. For the sake of completeness, I attach the updated macro checkRooAbsPdfGenerate.C (4.5 KB)

But let me ask a few questions:

  • How do I know that I have to increase nSample or nCell*D? Is there any rule of thumb ?
  • Will the failure always be that drastic as I demonstrated it, or can also slight shape distortions happen?

I would suggest to add a note similar to the following to the RooAbsPdf::generate documentation.

\note Depending on your model, it may be necessary to change the generator settings. For the default generator, which is RooFoamGenerator, the number of samples or cells could be increased by e.g.
   RooAbsPdf::defaultGeneratorConfig()->getConfigSection("RooFoamGenerator").setRealValue("nSample",1e4);

At least it would be a good thing to point users to the possibility of changing the settings of the underlying generator. Thus the risk of using the generate member as a black magic box doing nice things for me would be reduced :wink: .

Cheers Tim

Coming soon:

:slight_smile:

Perfect ! Thanks a lot :grinning:

I thought this underlying generator is not documented anywhere. But as you pointed out in the updated text, there is the tutorial: https://root.cern.ch/doc/master/rf902__numgenconfig_8C.html