Help Generating Dataset from Nested RooSimultaneous

I have a framework which creates a Pdf that is a RooSimultaneous that contains additional RooSimultaneous Pdfs.

In checking that things are working correctly I desired to create a dataset from this Pdf, however when I use the generate command (myPdf->generate(myArgSet, Extended())) the dataset generated seems to correctly use the pdf selected from top-level RooSimultaneous, but sample from all pdfs in the sub-RooSimultaneouses instead of the one corresponding to the current category label.

I have created a simple example of a nested RooSimultaneous structure (code included below) to reproduce the problem I’m having. This example has 2 categories with 2 labels each, and each label pair has a pdf which is a uniform distribution over a single quadrent in the x-y plane to make it easy to see what is being sampled.

In this example I run into a different error, namely that I use the command myRooSimultaneous->generate(myArgSet, nEvents) to generate a data set, and this works fine for a single-layered RooSimultaneous, and creates nEvents events as expected, but when I do the same on a nested RooSimultaneous, I get the error:

[#0] ERROR:Generation -- RooSimGenContext::ctor(simulPdf) ERROR: Need either extended mode or prototype data to calculate number of events per category
[#0] ERROR:Generation -- RooAbsPdf::generate(simulPdf) cannot create a valid context

I had naively viewed a RooSimultaneous as a container that points to the proper contained pdf, so that when I would run the generate command it would call that same command on the relevant pdf, but this doesn’t seem to be correct based on the above, which may also explain my initial problem.

I had also believed that if I had a nested RooSimultaneous it was internally being re-expressed as a single-layered container that used a new category that was created from a combination of the ones used for nesting. But the above error has also called this into question.

My question, then, is how do nested RooSimultaneous work, and how can I generate a data set from on that either properly isolates just one of the contained pdfs (past all of the rooSimultaneous layers) or properly maintains the ratio of expected events from each sub-pdf.

Here is the example code that works as expected for a single RooSimultaneous but gives me problems when using a nested RooSimultaneous:

using namespace RooStats;
using namespace RooFit;
using std::cout;
using std::endl;



int main(){

  // workspace
  RooWorkspace* w = new RooWorkspace("Workspace");

  cout << "workspace created" << endl;
  
  // variables
  RooRealVar x("x", "x", -1, 1);
  x.setBins(2);
  w->import(x);

  RooRealVar y("y", "y", -1, 1);
  y.setBins(2);
  w->import(y);

  w->defineSet("vars","x,y");
  
  // categories
  RooCategory cat1("cat1", "cat1");
  cat1.defineType("cat1Negative", 0);
  cat1.defineType("cat1Positive", 1);
  w->import(cat1);
  
  RooCategory cat2("cat2", "cat2");
  cat2.defineType("cat2Negative", 0);
  cat2.defineType("cat2Positive", 1);
  w->import(cat2);
  
  w->defineSet("VarsAndCats","x,y,cat1,cat2");

  cout << "variables set" << endl;


  //-------------- Create Pdfs ------------------
  // create negative x negative y pdf
  RooDataSet xNegYNegDS("xNegYNegDS", "xNegYNegDS", *w->set("vars"));
  w->var("x")->setVal(-0.5);
  w->var("y")->setVal(-0.5);
  for(int i = 0; i < 1; i++){xNegYNegDS.add(*w->set("vars"));}
  RooDataHist xNegYNegDH("xNegYNegDH", "xNegYNegDH", *w->set("vars"),
                         xNegYNegDS);
  w->import(xNegYNegDH);
  RooHistPdf xNegYNegPdf("xNegYNegPdf", "xNegYNegPdf", *w->set("vars"),
                         xNegYNegDH);
  w->import(xNegYNegPdf);

  // create negative x positive y pdf
  RooDataSet xNegYPosDS("xNegYPosDS", "xNegYPosDS", *w->set("vars"));
  w->var("x")->setVal(-0.5);
  w->var("y")->setVal(0.5);
  for(int i = 0; i < 2; i++){xNegYPosDS.add(*w->set("vars"));}
  RooDataHist xNegYPosDH("xNegYPosDH", "xNegYPosDH", *w->set("vars"),
                         xNegYPosDS);
  w->import(xNegYPosDH);
  RooHistPdf xNegYPosPdf("xNegYPosPdf", "xNegYPosPdf", *w->set("vars"),
                         xNegYPosDH);
  w->import(xNegYPosPdf);

    // create positive x negative y pdf
  RooDataSet xPosYNegDS("xPosYNegDS", "xPosYNegDS", *w->set("vars"));
  w->var("x")->setVal(0.5);
  w->var("y")->setVal(-0.5);
  for(int i = 0; i < 3; i++){xPosYNegDS.add(*w->set("vars"));}
  RooDataHist xPosYNegDH("xPosYNegDH", "xPosYNegDH", *w->set("vars"),
                         xPosYNegDS);
  w->import(xPosYNegDH);
  RooHistPdf xPosYNegPdf("xPosYNegPdf", "xPosYNegPdf", *w->set("vars"),
                         xPosYNegDH);
  w->import(xPosYNegPdf);

    // create positive x positive y pdf
  RooDataSet xPosYPosDS("xPosYPosDS", "xPosYPosDS", *w->set("vars"));
  w->var("x")->setVal(0.5);
  w->var("y")->setVal(0.5);
  for(int i = 0; i < 4; i++){xPosYPosDS.add(*w->set("vars"));}
  RooDataHist xPosYPosDH("xPosYPosDH", "xPosYPosDH", *w->set("vars"),
                         xPosYPosDS);
  w->import(xPosYPosDH);
  RooHistPdf xPosYPosPdf("xPosYPosPdf", "xPosYPosPdf", *w->set("vars"),
                         xPosYPosDH);
  w->import(xPosYPosPdf);

  cout << "sub pdfs created" << endl;



  //--------------- Combine Pdfs into RooSimultaneous ----------------
  // create RooSimultaneous
  RooSimultaneous simulXNegPdf("simulXNegPdf", "simulXNegPdf",
                               *w->cat("cat2"));
  simulXNegPdf.addPdf(*w->pdf("xNegYNegPdf"), "cat2Negative");
  simulXNegPdf.addPdf(*w->pdf("xNegYPosPdf"), "cat2Positive");
  w->import(simulXNegPdf);

  RooSimultaneous simulXPosPdf("simulXPosPdf", "simulXPosPdf",
                               *w->cat("cat2"));
  simulXPosPdf.addPdf(*w->pdf("xPosYNegPdf"), "cat2Negative");
  simulXPosPdf.addPdf(*w->pdf("xPosYPosPdf"), "cat2Positive");
  w->import(simulXPosPdf);

  //use factory cmd for the second because can't add roopdf to roopdf
  w->factory("SIMUL::simulPdf(cat1, cat1Negative=simulXNegPdf, cat1Positive=simulXPosPdf)");
  
  // check that the evaluation works
  w->cat("cat1")->setLabel("cat1Negative");
  w->cat("cat2")->setLabel("cat2Negative");
  for(int i = 0; i < 10; i++){
    w->var("x")->setVal(.2*i - 1);
    w->var("y")->setVal((.15*i -1));
    cout << "cat1 negative, cat2 negative: x=" << w->var("x")->getVal()
         << " y=" << w->var("y")->getVal()
         << ":  "
         << w->pdf("simulPdf")->getVal() << endl;
  }


  //------------------ Generate Events --------------------------

  // single simul first
  w->cat("cat1")->setLabel("cat1Negative");
  w->cat("cat2")->setLabel("cat2Negative");
  RooDataSet* cat1NegDS =
    w->pdf("simulXNegPdf")->generate(*w->set("vars"), 5);

  // look at the events
  for(int i = 0; i < cat1NegDS->numEntries(); i++){
    const RooArgSet* tmpSet = cat1NegDS->get(i);

    cout << "generated x: " << tmpSet->getRealValue("x") << " " 
          << "generated y: " << tmpSet->getRealValue("y") << endl;
  }

  // try positive y
  w->cat("cat2")->setLabel("cat2Positive");
  RooDataSet* cat1PosDS =
    w->pdf("simulXNegPdf")->generate(*w->set("vars"), 5);

  // look at the events
  for(int i = 0; i < cat1PosDS->numEntries(); i++){
    const RooArgSet* tmpSet = cat1PosDS->get(i);

    cout << "generated x: " << tmpSet->getRealValue("x") << " " 
          << "generated y: " << tmpSet->getRealValue("y") << endl;
  }

  
  // try nested simul
  w->cat("cat1")->setLabel("cat1Negative");
  w->cat("cat2")->setLabel("cat2Negative");
  RooDataSet* genNegDS = w->pdf("simulPdf")->generate(*w->set("vars"), 10);

  // look at events
  for(int i = 0; i < genNegDS->numEntries(); i++){
    const RooArgSet* tmpSet = genNegDS->get(i);

    cout << "generated x: " << tmpSet->getRealValue("x")
         << endl;
    cout << "generated y: " << tmpSet->getRealValue("y")
         << endl;
  }
  
}

I appreciate any input.

Thanks,

Shaun

In an effort to bypass the generate call on the top layered RooSimultaneous, I attempted to use the RooSuperCategory created by this RooSimultaneous to just access the stored pdf directly and use it to generate the events. I replaced the code in the above post below

"// try nested simul"

with the following:

  // try nested simul
  w->cat("cat1")->setLabel("cat1Negative");
  w->cat("cat2")->setLabel("cat2Negative");


  cout << "top layer pdf RooSuperCategory named: "
       << ((RooSimultaneous*)w->pdf("simulPdf"))->indexCat().GetName()
       << endl;
  cout << "super category label being used: "
       << ((RooSuperCategory*)w->obj("simulPdf_index"))->getLabel() << endl;

  // generate the actual data set
  RooDataSet* genNegDS =
    ((RooSimultaneous*)w->pdf("simulPdf"))
    ->getPdf(((RooSuperCategory*)w->obj("simulPdf_index"))->getLabel())
    ->generate(*w->set("vars"), 10);

  
  cout << "data generated from pdf: "
       << ((RooSimultaneous*)w->pdf("simulPdf"))
    ->getPdf(((RooSuperCategory*)w->obj("simulPdf_index"))->getLabel())
    ->GetName() << endl;
  
  // look at events
  for(int i = 0; i < genNegDS->numEntries(); i++){
    const RooArgSet* tmpSet = genNegDS->get(i);

    cout << "generated x: " << tmpSet->getRealValue("x") << "  "
         << "generated y: " << tmpSet->getRealValue("y")
         << endl;
  }
    // try nested simul
  w->cat("cat1")->setLabel("cat1Negative");
  w->cat("cat2")->setLabel("cat2Negative");


  cout << "top layer pdf RooSuperCategory named: "
       << ((RooSimultaneous*)w->pdf("simulPdf"))->indexCat().GetName()
       << endl;
  cout << "super category label being used: "
       << ((RooSuperCategory*)w->obj("simulPdf_index"))->getLabel() << endl;

  // generate the actual data set
  RooDataSet* genNegDS =
    ((RooSimultaneous*)w->pdf("simulPdf"))
    ->getPdf(((RooSuperCategory*)w->obj("simulPdf_index"))->getLabel())
    ->generate(*w->set("vars"), 10);

  
  cout << "data generated from pdf: "
       << ((RooSimultaneous*)w->pdf("simulPdf"))
    ->getPdf(((RooSuperCategory*)w->obj("simulPdf_index"))->getLabel())
    ->GetName() << endl;
  
  // look at events
  for(int i = 0; i < genNegDS->numEntries(); i++){
    const RooArgSet* tmpSet = genNegDS->get(i);

    cout << "generated x: " << tmpSet->getRealValue("x") << "  "
         << "generated y: " << tmpSet->getRealValue("y")
         << endl;
  }

This yielded the following output:

cat1 negative, cat2 negative: x=-1 y=-1: 1
cat1 negative, cat2 negative: x=-0.8 y=-0.85: 1
cat1 negative, cat2 negative: x=-0.6 y=-0.7: 1
cat1 negative, cat2 negative: x=-0.4 y=-0.55: 1
cat1 negative, cat2 negative: x=-0.2 y=-0.4: 1
cat1 negative, cat2 negative: x=0 y=-0.25: 0
cat1 negative, cat2 negative: x=0.2 y=-0.1: 0
cat1 negative, cat2 negative: x=0.4 y=0.05: 0
cat1 negative, cat2 negative: x=0.6 y=0.2: 0
cat1 negative, cat2 negative: x=0.8 y=0.35: 0
generated x: -0.858381 generated y: -0.26699
generated x: -0.729474 generated y: -0.784387
generated x: -0.915259 generated y: -0.689318
generated x: -0.0609316 generated y: -0.0226334
generated x: -0.351653 generated y: -0.496254
generated x: -0.219934 generated y: 0.958981
generated x: -0.847732 generated y: 0.208973
generated x: -0.117388 generated y: 0.827671
generated x: -0.965437 generated y: 0.87946
generated x: -0.161526 generated y: 0.324706
above events generated from PDF: xNegYPosPdf
top layer pdf RooSuperCategory named: simulPdf_index
super category label being used: {cat1Negative;cat2Negative}
data generated from pdf: xNegYNegPdf
generated x: -0.5 generated y: -0.5
generated x: -0.5 generated y: 0.5
generated x: 0.5 generated y: -0.5
generated x: 0.5 generated y: 0.5

Everything works as expected until I look at the generated events from the pdf I accessed.
First, there are 4 events instead of the 10 requested.
Second, they appear to be the values stored in the 4 histograms that were used to create the RooHistPdfs earlier, not events generated from the single accessed RooHistPdf.

I am pretty confused as to what is going on here.

later edit

It turns out this problem (generating only 4 strange events rather than the number requested) turned out to be related to my RooHistPdfs, not the RooSimultaneous in any way. I needed to deactivate AutoBinned (see RooHistPdf Generate method not working as expected - #3 by SAlsum)

The following changes then worked to generate events from the stored pdfs:

 // get the name of the top layer RooSimultaneous (Super)Category
  string superCatName =
    ((RooSimultaneous*)w->pdf("simulPdf"))->indexCat().GetName();
  cout << "top layer pdf RooSuperCategory named: "
       << superCatName << endl;
  string thisSuperCatLabel;
  
  w->cat("cat1")->setLabel("cat1Negative");
  w->cat("cat2")->setLabel("cat2Positive");

  // generate the actual data set
  thisSuperCatLabel =
    ((RooAbsCategory*)w->obj(superCatName.c_str()))->getLabel();
  cout << "super cat label is: " << thisSuperCatLabel << endl;
  RooDataSet* genNegDS =
    ((RooSimultaneous*)w->pdf("simulPdf"))
    ->getPdf(thisSuperCatLabel.c_str())
    ->generate(*w->set("vars"), AutoBinned(0), NumEvents(5), Verbose(1));

  // look at events
  for(int i = 0; i < genNegDS->numEntries(); i++){
    const RooArgSet* tmpSet = genNegDS->get(i);

    cout << "generated x: " << tmpSet->getRealValue("x") << "  "
         << "generated y: " << tmpSet->getRealValue("y")
         << endl;
  }

Hi @SAlsum,

1. Weighted vs. Unweighted Events

indeed, that’s unrelated to the first problem. Just for completeness (if somebody else reads this thread):
If you ask for 10 events on a binned dataset, you will get one entry per bin (four in your case), but the sum of weights will be 10. More details can be found in the post you linked.

2. Number of Events in Nested Categories
Could you just add a short answer on how you got around the following problem?

I can at least tell you why it doesn’t work when you just ask the top-level PDF for a specific number of events. The problem is that RooFit cannot know how many events to put in which category if the top-level PDF is not extended. Let’s say you want 5 events, and you have two categories. Should the split be {3,2} or {2.5,2.5}, or something completely different? In your example it’s even a bit more complicated because the categories are nested, so if you join A and B, which each have a category 1 and 2, there are actually four categories (A1 B1, A2 B1, A1 B2, A2 B2).

There’s two options I can see how to solve this:

  • The SuperCategory. It creates one category for each combination of sub-categories, so it joins them. With this, you can select each combination of categories in A and B directly. That’s why the generation succeeds, because you can say for each category how many events you want.
  • You need to extend the PDFs when making the categories. When you do this, (let’s say the original A1 has 5 events and the original A2 has 10), it’s clear that the ratio of A1 to A2 is 0.5. Then, if you ask the top-level PDF for 9 events from A, there should be 3 in A1 and 6 in A2.

Thank you for your reply.

Since posting I have indeed used both of the methods you mentioned to generate a data set. I would like to ask about the second method, however, because this is what led to me digging into this issue in the first place.

I have since modified my example to extend the 4 data sets via

 RooExtendPdf xNegYNegExt("xNegYNegExt", "xNegYNegExt", xNegYNegPdf,
                           RooConst(1));
  w->import(xNegYNegExt);
RooExtendPdf xNegYPosExt("xNegYPosExt", "xNegYPosExt", xNegYPosPdf,
                           RooConst(2));
  w->import(xNegYPosExt);
RooExtendPdf xPosYNegExt("xPosYNegExt", "xPosYNegExt", xPosYNegPdf,
                           RooConst(3));
  w->import(xPosYNegExt);
RooExtendPdf xPosYPosExt("xPosYPosExt", "xPosYPosExt", xPosYPosPdf,
                           RooConst(4));
  w->import(xPosYPosExt);

and

 // create RooSimultaneous
  RooSimultaneous simulXNegPdf("simulXNegPdf", "simulXNegPdf",
                               *w->cat("cat2"));
  simulXNegPdf.addPdf(*w->pdf("xNegYNegExt"), "cat2Negative");
  simulXNegPdf.addPdf(*w->pdf("xNegYPosExt"), "cat2Positive");
  w->import(simulXNegPdf);

  RooSimultaneous simulXPosPdf("simulXPosPdf", "simulXPosPdf",
                               *w->cat("cat2"));
  simulXPosPdf.addPdf(*w->pdf("xPosYNegExt"), "cat2Negative");
  simulXPosPdf.addPdf(*w->pdf("xPosYPosExt"), "cat2Positive");
  w->import(simulXPosPdf);

  //use factory cmd for the second because can't add roopdf to roopdf
  w->factory("SIMUL::simulPdf(cat1, cat1Negative=simulXNegPdf, cat1Positive=simulXPosPdf)");

so that all of the element PDFs are extendable.

I can now ask for events generated from the non-nested RooSimultaneous simulXNegPdf in two different ways:

  1. Generated events do not contain the category that indexes the RooSimultaneous.
w->cat("cat1")->setLabel("cat1Negative");
w->cat("cat2")->setLabel("cat2Negative");
RooDataSet* cat1NegDS =
    w->pdf("simulXNegPdf")->generate(*w->set("vars"), Extended());

This method appears to return data points generated solely from the PDF currently pointed to by the RooSimultaneous. That is, from xNegYNegExt.
Output from the above:

generated x: -0.858381 generated y: -0.26699
generated x: -0.729474 generated y: -0.784387
generated x: -0.915259 generated y: -0.689318
generated x: -0.0609316 generated y: -0.0226334
generated x: -0.351653 generated y: -0.496254
generated x: -0.551279 generated y: -0.325996
generated x: -0.207434 generated y: -0.617259
generated x: -0.928306 generated y: -0.787035
generated x: -0.661467 generated y: -0.548748
generated x: -0.0806384 generated y: -0.0753311
  1. Generated events contain the category that indexes the RooSimultaneous.
 RooDataSet* cat1NegDS2 =
    w->pdf("simulXNegPdf")->generate(*w->set("VarsAndCats"), NumEvents(10));

This method appears to return data points generated from the component PDFs (presumably) weighted by their expected number of events from being extended.
Output from this method:

generated x: -0.688646 generated y: -0.81131 generated cat2: cat2Negative
generated x: -0.395074 generated y: -0.244434 generated cat2: cat2Negative
generated x: -0.365669 generated y: -0.038206 generated cat2: cat2Negative
generated x: -0.358871 generated y: -0.958033 generated cat2: cat2Negative
generated x: -0.275645 generated y: -0.450404 generated cat2: cat2Negative
generated x: -0.850337 generated y: -0.59978 generated cat2: cat2Negative
generated x: -0.939029 generated y: -0.209658 generated cat2: cat2Negative
generated x: -0.474427 generated y: -0.91237 generated cat2: cat2Negative
generated x: -0.0425157 generated y: 0.779136 generated cat2: cat2Positive
generated x: -0.840567 generated y: 0.570187 generated cat2: cat2Positive

This behavior does not mirror that of the nested RooSimultaneous.
When using method 1 from above:

RooDataSet* varsDS =
    w->pdf("simulPdf")->generate(*w->set("vars"), AutoBinned(0), NumEvents(10));

I appear to get data points generated from all element pdfs, instead of just the one currently pointed to (odd, because this was the case for the non-nested RooSimultaneous).
The output:

event: 0
generated x: 0.569484 generated cat1:
generated y: -0.625338 generated cat2:
event: 1
generated x: -0.512809 generated cat1:
generated y: -0.571389 generated cat2:
event: 2
generated x: 0.392053 generated cat1:
generated y: 0.87093 generated cat2:
event: 3
generated x: -0.406055 generated cat1:
generated y: 0.963375 generated cat2:
event: 4
generated x: -0.476313 generated cat1:
generated y: -0.944435 generated cat2:
event: 5
generated x: -0.8927 generated cat1:
generated y: -0.503686 generated cat2:
event: 6
generated x: -0.747976 generated cat1:
generated y: -0.42416 generated cat2:
event: 7
generated x: 0.131791 generated cat1:
generated y: -0.444115 generated cat2:
event: 8
generated x: 0.413081 generated cat1:
generated y: 0.562298 generated cat2:
event: 9
generated x: 0.299256 generated cat1:
generated y: 0.709301 generated cat2:

(recall that each element pdf was only non-zero in one quadrant)

Likewise, if I use method 2 from above:

w->cat("cat1")->setLabel("cat1Negative");
w->cat("cat2")->setLabel("cat2Positive");
RooDataSet* VarsAndCatsDS =
    w->pdf("simulPdf")->generate(*w->set("VarsAndCats"), AutoBinned(0),
                                 NumEvents(10));

I again get data points from each of the element pdfs, but the categories they came from are not tagged correctly, but are instead just stated as being the same as those in the argset the data was generated from.
The output from:

  for(int i = 0; i < VarsAndCatsDS->numEntries(); i++){
    const RooArgSet* tmpSet = VarsAndCatsDS->get(i);

    cout << "event: " << i << endl
         << "generated x: " << tmpSet->getRealValue("x") << " " 
         << "generated cat1: " << tmpSet->getCatLabel("cat1") << endl
         << "generated y: " << tmpSet->getRealValue("y") << " "
         << "generated cat2: " << tmpSet->getCatLabel("cat2") << endl;
  }
event: 0
generated x: 0.137008 generated cat1: cat1Negative
generated y: 0.121574 generated cat2: cat2Positive
event: 1
generated x: -0.828715 generated cat1: cat1Negative
generated y: -0.0830955 generated cat2: cat2Positive
event: 2
generated x: 0.678632 generated cat1: cat1Negative
generated y: -0.0751703 generated cat2: cat2Positive
event: 3
generated x: 0.660149 generated cat1: cat1Negative
generated y: 0.568897 generated cat2: cat2Positive
event: 4
generated x: -0.40702 generated cat1: cat1Negative
generated y: 0.571721 generated cat2: cat2Positive
event: 5
generated x: -0.181896 generated cat1: cat1Negative
generated y: -0.240481 generated cat2: cat2Positive
event: 6
generated x: -0.659247 generated cat1: cat1Negative
generated y: -0.0540779 generated cat2: cat2Positive
event: 7
generated x: -0.404749 generated cat1: cat1Negative
generated y: 0.554353 generated cat2: cat2Positive
event: 8
generated x: -0.550924 generated cat1: cat1Negative
generated y: 0.350916 generated cat2: cat2Positive
event: 9
generated x: -0.132489 generated cat1: cat1Negative
generated y: -0.525622 generated cat2: cat2Positive

This is not altogether unsurprising, because after all, this RooSimultaneous is not truly indexed over these two categories (cat1 and cat2), but the superCategory simulPdf_index. However, I cannot generate a data set with this superCategory, because if I try to do this

w->defineSet("varsAndSuperCat", "x,y,simulPdf_index");
RooDataSet* VarsAndCatsDS =
    w->pdf("simulPdf")->generate(*w->set("varsAndSuperCat"), AutoBinned(0),
                                 NumEvents(10));

or this

RooArgSet genSet(x, y, *(RooAbsCategory*)w->obj("simulPdf_index"));
RooDataSet* VarsAndCatsDS =
    w->pdf("simulPdf")->generate(genSet, AutoBinned(0),
                                 NumEvents(10));

I get the error

[#0] ERROR:Generation -- RooGenContext::ctor(): cannot generate values for derived "simulPdf_index"

I would appreciate some insight as to why method 1 seems to differ in results between the nested and non-nested cases, and how one might achieve the analogous results of method 2 (generate events weighted based on extended term, but also record which category they came from) for the nested case.

I already presented a workable solution in my previous reply, so this is pretty much academic now, but understanding better how this works can only be a good thing.

Thanks,

Shaun

Hi @SAlsum,

A little side note: You should be able to do this:

RooSimultaneous simulPdf("simulPdf", "simulPdf",
    RooArgList(simulXNegPdf, sumlXPosPdf),
    <cat1 object>);

Yes, that’s what’s happening. By setting cat1, you select the PDF to generate from. This by the way has the same effect on getVal().

You can actually check that. Something like

genNegDS->weight()

should give you the weight of the last entry you loaded. genNegDS->sumEntries() should yield the sum of weights.

Do you mean that when directly generating from the nested simultaneous, you only get events from one category? That’s I guess expected. Here, you are bypassing any super category, and directly drilling down to the lowest PDF. The fact that the top simultaneous PDF behaves differently is probably because of the super category that is created by joining the original categories. Did you check the printouts during the setup of the top simultaneous? Is it printing stuff like:

RooSimultaneous::initialize(...) INFO: one or more input component of simultaneous p.d.f.s are simultaneous p.d.f.s themselves, rewriting composite expressions as one-level simultaneous p.d.f. in terms of final constituents and extended index category

and

InputArguments ... RooSimultaneous::initialize(...) assigning pdf ... to super label ...

?
The RooSuperCategory documentation suggests that the super category and the sub-categories are connected. That is, if you select a super category, this will also select the appropriate categories in the sub categories. This might reset the categories at any time if you set the state of the sub categories manually. Probably the confusion / mixed results are created due to this.

  • Why method 1 differs was probably explained above, the difference between directly accessing a sub PDF and using a SuperCategory on the top PDF.
  • How to achieve the same using method 2? I don’t see an easy way because you cannot store a SuperCategory in a dataset (that’s the error about not being fundamental).
    Probably, you would have to do it manually, that is, select a supercategory, and generate a number of events according to its fraction of the total data, select the next supercategory, repeat. You could add the category labels by creating a dataset that only has the category labels, and join the event and category data.

Do you mean that when directly generating from the nested simultaneous, you only get events from one category? That’s I guess expected. Here, you are bypassing any super category, and directly drilling down to the lowest PDF.

I mean precisely the opposite.
When I use method 1 on the non-nested RooSimultaneous (simulXNegPdf), I get events from one category. I “drill down” as you say to the lowest level pdf (in this case, there is of course only 1 level).
But when I use method 1 on the nested* RooSimultaneous, I get events from all categories. You can see this in the output I pasted after the w->pdf("simulPdf")->generate(*w->set("vars"), AutoBinned(0), NumEvents(10)) command (though I don’t blame anyone for not carefully looking through all of that).

As for method 2, I agree, I can’t find a way to make it work other than how you describe.

Thanks again for your help

*not technically nested because, as you say, instead of actually containing other RooSimultaneous, it just assigns each component pdf a RooSuperCategory label.

I found the solution.

  1. To have the categories in the dataset, you indeed need to ask for generation of {x, y, cat1, cat2}. I took your example, stripped it down a bit, and implemented the generation.
    It also shows how to easily print ArgSets, variables and PDFs, so you have to write less code. (Print("V") prints verbose, and Print("T") prints a tree, such that you can see all the sub pdfs.)
    ROOT_10093.C (4.3 KB)

With your permission, I will add this to the RooFit tutorials to show how it’s done.

  1. The categories don’t change properly because there was a bug. I already fixed it. It just needs to be merged into ROOT. You can see the progress here:
    https://sft.its.cern.ch/jira/browse/ROOT-10093

A little demo. Generating with the fix now yields:

The first 20 events:
0:
  1) RooRealVar::    x = 0.958632
  2) RooRealVar::    y = 0.804701
  3) RooCategory:: cat1 = cat1Positive(idx = 1)

  4) RooCategory:: cat2 = cat2Positive(idx = 1)

1:
  1) RooRealVar::    x = 0.901833
  2) RooRealVar::    y = -0.0549799
  3) RooCategory:: cat1 = cat1Positive(idx = 1)

  4) RooCategory:: cat2 = cat2Negative(idx = 0)

2:
  1) RooRealVar::    x = 0.66985
  2) RooRealVar::    y = -0.600103
  3) RooCategory:: cat1 = cat1Positive(idx = 1)

  4) RooCategory:: cat2 = cat2Negative(idx = 0)

3:
  1) RooRealVar::    x = -0.100481
  2) RooRealVar::    y = -0.668797
  3) RooCategory:: cat1 = cat1Negative(idx = 0)

  4) RooCategory:: cat2 = cat2Negative(idx = 0)


That’s fine with me, although you may want to eliminate the workspace line since it is no longer used (I was originally using it only because it was there when I encountered the original problem, and wanted to stick as close as possible).

Thanks for looking into this and helping me out.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.