RooHistPdf Generate method not working as expected

This is most likely related to my post from yesterday (Help Generating Dataset from Nested RooSimultaneous) but is a completely separate question.

In the following sample code I create a RooHistPdf and attempt to use the generate(const RooArgSet &whatVars, Int_t nEvents) method inherited from RooAbsPdf to create a RooDataSet*.

When doing so the data set created is always just a list of the different values used in creating the initial RooDataHist that was used to create the RooHistPdf (not the events in that set, but 1 entry for each unique value) instead of the expected data set sampled from the Pdf with nEvents.

The following code

int main(){
  // variables
  RooRealVar x("x", "x", -1, 1);

  RooArgSet myVars(x);
  // Create RooHistPdf
  RooDataSet createDS("createDS", "createDS", myVars);
  for(int i = 0; i < 3; i++){createDS.add(myVars);}
  for(int i = 0; i < 4; i++){createDS.add(myVars);}

  RooDataHist createDH("createDH", "createDH", myVars, createDS);

  RooHistPdf theHistPdf("theHistPdf", "theHistPdf", myVars, createDH);

  // generate events

  RooDataSet* genDS = theHistPdf.generate(myVars, NumEvents(10), Verbose(1));

  // look at the generated dataset
  cout << "nEntries: " << genDS->numEntries() << endl;

  for(int i = 0; i < genDS->numEntries(); i++){
    const RooArgSet* tmpSet = genDS->get(i);

    cout << "generated x: " << tmpSet->getRealValue("x") << endl;

yields the result

nEntries: 2
generated x: -0.5
generated x: 0.5

Any idea what’s going on or how to actually generate a data set from a RooHistPdf?



By Contrast, when doing the exact same procedure for a RooGaussian

// try with a gaussian just to see if different
  RooConstVar gausMean("gausMean", "gausMean", 5);
  RooConstVar gausStd("gausMean", "gausMean", 1);
  RooGaussian theGausPdf("theGausPdf", "theGausPdf", x, gausMean, gausStd);

  RooDataSet* gausDS = theGausPdf.generate(myVars, NumEvents(10), Verbose(1));

  // look at the generated dataset
  cout << "nEntries: " << gausDS->numEntries() << endl;

  for(int i = 0; i < gausDS->numEntries(); i++){
    const RooArgSet* tmpSet = gausDS->get(i);

    cout << "generated x: " << tmpSet->getRealValue("x") << endl;

I get exactly the output I expect:

 --- RooGenContext ---
Using PDF RooGaussian::theGausPdf[ x=x mean=gausMean sigma=gausMean ]
Use PDF generator for (x)
Use MC sampling generator <none> for ()
nEntries: 10
generated x: -0.801468
generated x: -0.901076
generated x: -0.952138
generated x: -0.461695
generated x: 0.569974
generated x: 0.526766
generated x: -0.118208
generated x: 0.458434
generated x: -0.197818
generated x: 0.850341

For anyone interested, I achieved the expected behavior by adding the RooCmdArg AutoBinned(0) to my generate command.

RooDataSet* genDS = theHistPdf.generate(myVars, NumEvents(10), Verbose(1));


RooDataSet* genDS = theHistPdf.generate(myVars, AutoBinned(0), NumEvents(10), Verbose(1));

I am still unsure as to why the previous method gave me the results it did, perhaps it returned a RooDataHist and the bin centers were located in the same spot in memory where the point values would have been for a RooHistPdf? Regardless, turning AutoBinned off gave me what I wanted.

Hi @SAlsum,

it looks like the RooDataHist is taking a shortcut here. If you check here, you see “Datasets that are generated in binned mode are returned as weighted unbinned datasets”. So, instead of generating let’s say 10 events when you ask for 10, it generates one event per bin (note that you have only two bins!), but sets their weights such that the total sum of weights is equal to 10 (+/- Poisson fluctuations). Statistically speaking, you get the same result (two bins, sum of weights = 10), but it’s faster.

With the Gaussian, I assume, you didn’t ask for a binned dataset, so the Gaussian distribution is continuous. That’s why you get 10 different and unweighted values of x.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.