Generating events with RooSimultaneous - (non)empty categories

Dear Experts,

I am attempting to generate pseudodata with RooSimultaneous. The constituent pdfs are extended, so I am leaving the relative distribution of the generated events among the individual RooCategories up to the RooFit backend. Since I am generating a relatively small number of events, it may very well happen that some of the categories should be empty in the generated dataset. This is, however, not what happens - instead of “generating zero events” in a given category, RooFit generates number of events according to the extended term of the pdf associated with that category. That is perhaps a default, yet for my purposes undesirable behaviour.

Please, see the attached example that illustrates this - the first category always contains 100 generated events, even though I would expect it to be empty in most of the cases, as the corresponding extended term is tiny relative to the other two categories.
simultaneousGeneration_example.cxx (1.6 KB)

Is there, please, a way around this behaviour already implemented?

Many thanks,

Ondra

@jonas Do you have a suggestion?

Hello,
I can reproduce this issue. We will investigate what the problem is
Thank you for reporting this problem
Lorenzo

A workaround, that seems to be working is using the Protodata. You can first generate a protodataset and then use it to generate the wanted events.
You can do as following:

// generate using expected events in the pdf
RooDataSet* proto = simPdf.generate(RooArgList(x, bins));
// generate desired number of events
RooDataSet* gen_set = simPdf.generate(RooArgList(x, bins),NumEvents(5), ProtoData(*proto,true));

Lorenzo

Hello Lorenzo,

Thank you for your answer, it does indeed work even though it still behaves a bit surprisingly. The way you suggested does generate the first bin empty. However, if I inspect the “proto” dataset, I find out that the actual number of entries generated in the proto dataset is (most probably) a Poisson fluctuation of the actual extended terms, i.e. instead of 100 there’s 95, which changes if I compile once and run many times (if I compile again the sequence repeats - the seed is probably constant). I’ve tried to turn these fluctuations off by adding RooFit::Extended() option to the call:

RooDataSet* proto = simPdf->generate(RooArgList(x, *binns), RooFit::Extended())

but nothing changed - the generated numbers were still a fluctuation around the extended term value. I’ve also tried passing a false to the RooFit::Extended():

RooFit::Extended(false)

Which actually only changed the generator seed as I again got a sequence of fluctuations that repeated upon compilation, although different from the one I described above.

Please, is this behaviour expected, and is there a way to turn these fluctuations off?
Secondly, the behaviour I observed got me thinking: when I’m generating events from RooSimultaneous in the usual way, i.e. without the proto dataset, does RooFit determine the relative fractions of events generated into each of the categories based on the actual extended terms or their fluctuations as well?

Thank you,

Ondra

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.