Help Generating Dataset from Nested RooSimultaneous

Thank you for your reply.

Since posting I have indeed used both of the methods you mentioned to generate a data set. I would like to ask about the second method, however, because this is what led to me digging into this issue in the first place.

I have since modified my example to extend the 4 data sets via

 RooExtendPdf xNegYNegExt("xNegYNegExt", "xNegYNegExt", xNegYNegPdf,
                           RooConst(1));
  w->import(xNegYNegExt);
RooExtendPdf xNegYPosExt("xNegYPosExt", "xNegYPosExt", xNegYPosPdf,
                           RooConst(2));
  w->import(xNegYPosExt);
RooExtendPdf xPosYNegExt("xPosYNegExt", "xPosYNegExt", xPosYNegPdf,
                           RooConst(3));
  w->import(xPosYNegExt);
RooExtendPdf xPosYPosExt("xPosYPosExt", "xPosYPosExt", xPosYPosPdf,
                           RooConst(4));
  w->import(xPosYPosExt);

and

 // create RooSimultaneous
  RooSimultaneous simulXNegPdf("simulXNegPdf", "simulXNegPdf",
                               *w->cat("cat2"));
  simulXNegPdf.addPdf(*w->pdf("xNegYNegExt"), "cat2Negative");
  simulXNegPdf.addPdf(*w->pdf("xNegYPosExt"), "cat2Positive");
  w->import(simulXNegPdf);

  RooSimultaneous simulXPosPdf("simulXPosPdf", "simulXPosPdf",
                               *w->cat("cat2"));
  simulXPosPdf.addPdf(*w->pdf("xPosYNegExt"), "cat2Negative");
  simulXPosPdf.addPdf(*w->pdf("xPosYPosExt"), "cat2Positive");
  w->import(simulXPosPdf);

  //use factory cmd for the second because can't add roopdf to roopdf
  w->factory("SIMUL::simulPdf(cat1, cat1Negative=simulXNegPdf, cat1Positive=simulXPosPdf)");

so that all of the element PDFs are extendable.

I can now ask for events generated from the non-nested RooSimultaneous simulXNegPdf in two different ways:

  1. Generated events do not contain the category that indexes the RooSimultaneous.
w->cat("cat1")->setLabel("cat1Negative");
w->cat("cat2")->setLabel("cat2Negative");
RooDataSet* cat1NegDS =
    w->pdf("simulXNegPdf")->generate(*w->set("vars"), Extended());

This method appears to return data points generated solely from the PDF currently pointed to by the RooSimultaneous. That is, from xNegYNegExt.
Output from the above:

generated x: -0.858381 generated y: -0.26699
generated x: -0.729474 generated y: -0.784387
generated x: -0.915259 generated y: -0.689318
generated x: -0.0609316 generated y: -0.0226334
generated x: -0.351653 generated y: -0.496254
generated x: -0.551279 generated y: -0.325996
generated x: -0.207434 generated y: -0.617259
generated x: -0.928306 generated y: -0.787035
generated x: -0.661467 generated y: -0.548748
generated x: -0.0806384 generated y: -0.0753311
  1. Generated events contain the category that indexes the RooSimultaneous.
 RooDataSet* cat1NegDS2 =
    w->pdf("simulXNegPdf")->generate(*w->set("VarsAndCats"), NumEvents(10));

This method appears to return data points generated from the component PDFs (presumably) weighted by their expected number of events from being extended.
Output from this method:

generated x: -0.688646 generated y: -0.81131 generated cat2: cat2Negative
generated x: -0.395074 generated y: -0.244434 generated cat2: cat2Negative
generated x: -0.365669 generated y: -0.038206 generated cat2: cat2Negative
generated x: -0.358871 generated y: -0.958033 generated cat2: cat2Negative
generated x: -0.275645 generated y: -0.450404 generated cat2: cat2Negative
generated x: -0.850337 generated y: -0.59978 generated cat2: cat2Negative
generated x: -0.939029 generated y: -0.209658 generated cat2: cat2Negative
generated x: -0.474427 generated y: -0.91237 generated cat2: cat2Negative
generated x: -0.0425157 generated y: 0.779136 generated cat2: cat2Positive
generated x: -0.840567 generated y: 0.570187 generated cat2: cat2Positive

This behavior does not mirror that of the nested RooSimultaneous.
When using method 1 from above:

RooDataSet* varsDS =
    w->pdf("simulPdf")->generate(*w->set("vars"), AutoBinned(0), NumEvents(10));

I appear to get data points generated from all element pdfs, instead of just the one currently pointed to (odd, because this was the case for the non-nested RooSimultaneous).
The output:

event: 0
generated x: 0.569484 generated cat1:
generated y: -0.625338 generated cat2:
event: 1
generated x: -0.512809 generated cat1:
generated y: -0.571389 generated cat2:
event: 2
generated x: 0.392053 generated cat1:
generated y: 0.87093 generated cat2:
event: 3
generated x: -0.406055 generated cat1:
generated y: 0.963375 generated cat2:
event: 4
generated x: -0.476313 generated cat1:
generated y: -0.944435 generated cat2:
event: 5
generated x: -0.8927 generated cat1:
generated y: -0.503686 generated cat2:
event: 6
generated x: -0.747976 generated cat1:
generated y: -0.42416 generated cat2:
event: 7
generated x: 0.131791 generated cat1:
generated y: -0.444115 generated cat2:
event: 8
generated x: 0.413081 generated cat1:
generated y: 0.562298 generated cat2:
event: 9
generated x: 0.299256 generated cat1:
generated y: 0.709301 generated cat2:

(recall that each element pdf was only non-zero in one quadrant)

Likewise, if I use method 2 from above:

w->cat("cat1")->setLabel("cat1Negative");
w->cat("cat2")->setLabel("cat2Positive");
RooDataSet* VarsAndCatsDS =
    w->pdf("simulPdf")->generate(*w->set("VarsAndCats"), AutoBinned(0),
                                 NumEvents(10));

I again get data points from each of the element pdfs, but the categories they came from are not tagged correctly, but are instead just stated as being the same as those in the argset the data was generated from.
The output from:

  for(int i = 0; i < VarsAndCatsDS->numEntries(); i++){
    const RooArgSet* tmpSet = VarsAndCatsDS->get(i);

    cout << "event: " << i << endl
         << "generated x: " << tmpSet->getRealValue("x") << " " 
         << "generated cat1: " << tmpSet->getCatLabel("cat1") << endl
         << "generated y: " << tmpSet->getRealValue("y") << " "
         << "generated cat2: " << tmpSet->getCatLabel("cat2") << endl;
  }
event: 0
generated x: 0.137008 generated cat1: cat1Negative
generated y: 0.121574 generated cat2: cat2Positive
event: 1
generated x: -0.828715 generated cat1: cat1Negative
generated y: -0.0830955 generated cat2: cat2Positive
event: 2
generated x: 0.678632 generated cat1: cat1Negative
generated y: -0.0751703 generated cat2: cat2Positive
event: 3
generated x: 0.660149 generated cat1: cat1Negative
generated y: 0.568897 generated cat2: cat2Positive
event: 4
generated x: -0.40702 generated cat1: cat1Negative
generated y: 0.571721 generated cat2: cat2Positive
event: 5
generated x: -0.181896 generated cat1: cat1Negative
generated y: -0.240481 generated cat2: cat2Positive
event: 6
generated x: -0.659247 generated cat1: cat1Negative
generated y: -0.0540779 generated cat2: cat2Positive
event: 7
generated x: -0.404749 generated cat1: cat1Negative
generated y: 0.554353 generated cat2: cat2Positive
event: 8
generated x: -0.550924 generated cat1: cat1Negative
generated y: 0.350916 generated cat2: cat2Positive
event: 9
generated x: -0.132489 generated cat1: cat1Negative
generated y: -0.525622 generated cat2: cat2Positive

This is not altogether unsurprising, because after all, this RooSimultaneous is not truly indexed over these two categories (cat1 and cat2), but the superCategory simulPdf_index. However, I cannot generate a data set with this superCategory, because if I try to do this

w->defineSet("varsAndSuperCat", "x,y,simulPdf_index");
RooDataSet* VarsAndCatsDS =
    w->pdf("simulPdf")->generate(*w->set("varsAndSuperCat"), AutoBinned(0),
                                 NumEvents(10));

or this

RooArgSet genSet(x, y, *(RooAbsCategory*)w->obj("simulPdf_index"));
RooDataSet* VarsAndCatsDS =
    w->pdf("simulPdf")->generate(genSet, AutoBinned(0),
                                 NumEvents(10));

I get the error

[#0] ERROR:Generation -- RooGenContext::ctor(): cannot generate values for derived "simulPdf_index"

I would appreciate some insight as to why method 1 seems to differ in results between the nested and non-nested cases, and how one might achieve the analogous results of method 2 (generate events weighted based on extended term, but also record which category they came from) for the nested case.

I already presented a workable solution in my previous reply, so this is pretty much academic now, but understanding better how this works can only be a good thing.

Thanks,

Shaun