Nested RooSimultaneous PDFs - PDF Generation

Dear experts , this query is related to the post linked below , but slightly differnt.
Help Generating Dataset from Nested RooSimultaneous

So I am also trying to do what the OP in the above post was doing , I believe, trying to generate roodatasets using nested simultaneous PDFs.

Now, I am trying to do the same thing but with R.RooFit.Extended() option included.

reso_ws.factory("tagflav[B0,B0bar]")
reso_ws.factory("qr[qr0,qr1,qr2,qr3,qr4,qr5,qr6]")
obs_names = ["deltat", "tagflav", "deltaterr", "r", "mod_mbc", "deltae", "csobdtmu"]
obs_td = R.RooArgSet()
for name in obs_names:
obs_td.add(reso_ws[name])
obs_td.add(reso_ws.obj("qr"))   *# <-- crucial*
R.RooRandom.randomGenerator().SetSeed(23414)
*# ----------------- Create inner SIMUL PDFs per tag -----------------*
reso_ws.factory("
SIMUL::simPdf_sig_btag_all_qr(qr,
    qr0=ext_sig_pdf_td_btag_qr0,
    qr1=ext_sig_pdf_td_btag_qr1,
    qr2=ext_sig_pdf_td_btag_qr2,
    qr3=ext_sig_pdf_td_btag_qr3,
    qr4=ext_sig_pdf_td_btag_qr4,
    qr5=ext_sig_pdf_td_btag_qr5,
    qr6=ext_sig_pdf_td_btag_qr6
)
")

reso_ws.factory("
SIMUL::simPdf_sig_bartag_all_qr(qr,
qr0=ext_sig_pdf_td_bartag_qr0,
qr1=ext_sig_pdf_td_bartag_qr1,
qr2=ext_sig_pdf_td_bartag_qr2,
qr3=ext_sig_pdf_td_bartag_qr3,
qr4=ext_sig_pdf_td_bartag_qr4,
qr5=ext_sig_pdf_td_bartag_qr5,
qr6=ext_sig_pdf_td_bartag_qr6
)
")

reso_ws.factory("
SIMUL::simPdf_sig(tagflav,
    B0=simPdf_sig_btag_all_qr,
    B0bar=simPdf_sig_bartag_all_qr
)
")

*`# Get the top-level SIMUL PDF`*
simul_pdf = reso_ws.pdf("simPdf_sig")
super_cat_sig = simul_pdf.indexCat()

gen_dataset = simul_pdf.generate(
R.RooArgSet(obs_td),`
R.RooFit.AutoBinned(0),
R.RooFit.Extended(),    *# Poisson-fluctuated*
R.RooFit.Verbose()
)

All the ext_ pdfs are extended pdfs (14)

Now, the yield I am giving is 834, but the poisson fluctuation yield is almost half of that 476.

What am I doing wrong?
PFA the output of my geneartor

gen.txt (4.4 KB)

Maybe @jonas can take a look

Continuing the discussion from Nested RooSimultaneous PDFs - PDF Generation

Also, when I do this , I am getting unusually large events, like 2923+2960 = 5883.0 instead of ~ 456!!!

# --- Generate B0 tag separately ---
print("Generating B0 events...")
data_b0 = reso_ws.pdf("simPdf_sig_btag_all_qr").generate(R.RooArgSet(obs_td), R.RooFit.Extended())
print("B0 events:", data_b0.sumEntries())

# --- Generate B0bar tag separately ---

print("Generating B0bar events...")

data_bbar = reso_ws.pdf("simPdf_sig_bartag_all_qr").generate(R.RooArgSet(obs_td), R.RooFit.Extended())
print("B0bar events:", data_bbar.sumEntries())

# --- Append datasets to get final combined dataset ---

data_sig = data_b0.Clone("data_sig")

data_sig.append(data_bbar)

print("Total signal events (B0 + B0bar):", data_sig.sumEntries())

Hello @Vikas_Raj,

the expert on this would be @jonas, but I know that he’s travelling for the next two weeks. In the mean time, we can look a bit left and right – maybe we discover something that might solve the problem.

I checked a bit the documentation, and we could look at a few things:

  • The documentation of generate() says:

    Generate the specified number of events or expectedEvents() if not specified.

    So what is the value of expected() in your case?

  • What are the extended terms (= the value of expected()) of the sub-PDFs? Do they happen to sum to 476?

  • When you generate events, are you correctly specifying the observables in which to generate? I see that you used obs_td, but let’s just double check that this is the desired observable.

  • Are there binned datasets somewhere in these PDFs or are they unbinned? Remember that RooFit generates one datapoint per bin, but using a poisson-distributed weight as a speed optimisation. I see that you used AutoBin(0), but what happens if you allow for auto-binning?

  • Did you try printing the tree of PDFs? Use pdf.Print("T") to see all sub-expressions and possibly Print("V") on the categories or the dataset. Just to make sure that there isn’t some subtle error in the PDF definition.

1 Like

The expected sum for me is 834, before Poisson fluctuation

BTag QR0: 51.75218037592532
BBar QR0: 51.98562669417656
BTag QR1: 60.48489040134407
BBar QR1: 57.64428874784707
BTag QR2: 65.82192719880176
BBar QR2: 71.29586788807661
BTag QR3: 49.03754291191023
BBar QR3: 51.80200172559726
BTag QR4: 40.52934670887497
BBar QR4: 38.723139809998614
BTag QR5: 45.423574455618656
BBar QR5: 49.11974489728727
BTag QR6: 98.21690377886804
BBar QR6: 102.06302432179103
Total expected events: 833.9000599161175

Yes, I have double checked it includes all the variables that I want to generate along with both the Roocategories

At this point I see that when I use

# Correct: only fundamental variables
gen_dataset = simul_pdf.generate(
    R.RooArgSet(obs_td),                 # obs_td = RooArgSet of your RooRealVar observables
    R.RooFit.Extended(),    # Poisson-fluctuated
)

print("Number of generated events:", gen_dataset.sumEntries())

I get 103 events, which makes me think it is coming from the last pdf only BBar_qr6! instead of taking all 14 of them simultaneously which would equal to , or rather be close to 834!

Interestingly, if I do

gen_dataset_btag = reso_ws.pdf("simPdf_sig_btag_all_qr").generate(
    R.RooArgSet(obs_td),  # include the index category!
    R.RooFit.Extended(),
    R.RooFit.Verbose()
)

print("Total entries:", gen_dataset_btag.sumEntries())

gen_dataset_bartag = reso_ws.pdf("simPdf_sig_bartag_all_qr").generate(
    R.RooArgSet(obs_td),  # include the index category!
    R.RooFit.Extended(),
    R.RooFit.Verbose()
)

print("Total entries:", gen_dataset_bartag.sumEntries())

I get: 386+447 = 833, so somehow it is the nested RooSimultaneous which is not working

RooExtendPdf::ext_sig_pdf_td_btag_qr6[ pdf=sig_pdf_td_btag_qr6 n=extterm_sig_td_btag_qr6 ] = 6866.93

RooFormulaVar::extterm_sig_td_btag_qr6[ actualVars=(n_sig,fractd_sig_qr6,signorm_btag_qr6,signorm_bartag_qr6) formula="n_sig*fractd_sig_qr6*(signorm_btag_qr6/(signorm_btag_qr6+signorm_bartag_qr6))" ] = 98.2169

The above are one of the pdfs that I am using and it correctly splits, in my opinion , the yield across both the tags and the 7 qr bins!


--- RooAbsArg ---
  Value State: DIRTY
  Shape State: DIRTY
  Attributes:  [SnapShot_ExtRefClone] 
  Address: 0x55771a603b40
  Clients: 
  Servers: 
    (0x55771880cb50,V-) RooSuperCategory::simPdf_sig_index "simPdf_sig_index"
    (0x5577156b6270,V-) RooExtendPdf::ext_sig_pdf_td_btag_qr0 "ext_sig_pdf_td_btag_qr0"
    (0x5577176650d0,V-) RooExtendPdf::ext_sig_pdf_td_btag_qr1 "ext_sig_pdf_td_btag_qr1"
    (0x557715a06c30,V-) RooExtendPdf::ext_sig_pdf_td_btag_qr2 "ext_sig_pdf_td_btag_qr2"
    (0x5577157fb3b0,V-) RooExtendPdf::ext_sig_pdf_td_btag_qr3 "ext_sig_pdf_td_btag_qr3"
    (0x557715ae06c0,V-) RooExtendPdf::ext_sig_pdf_td_btag_qr4 "ext_sig_pdf_td_btag_qr4"
    (0x5577157178b0,V-) RooExtendPdf::ext_sig_pdf_td_btag_qr5 "ext_sig_pdf_td_btag_qr5"
    (0x55771574e2a0,V-) RooExtendPdf::ext_sig_pdf_td_btag_qr6 "ext_sig_pdf_td_btag_qr6"
    (0x5577157eef50,V-) RooExtendPdf::ext_sig_pdf_td_bartag_qr0 "ext_sig_pdf_td_bartag_qr0"
    (0x55771766d7e0,V-) RooExtendPdf::ext_sig_pdf_td_bartag_qr1 "ext_sig_pdf_td_bartag_qr1"
    (0x5577158a6030,V-) RooExtendPdf::ext_sig_pdf_td_bartag_qr2 "ext_sig_pdf_td_bartag_qr2"
    (0x55771768d8d0,V-) RooExtendPdf::ext_sig_pdf_td_bartag_qr3 "ext_sig_pdf_td_bartag_qr3"
    (0x557717663500,V-) RooExtendPdf::ext_sig_pdf_td_bartag_qr4 "ext_sig_pdf_td_bartag_qr4"
    (0x557715ae1c20,V-) RooExtendPdf::ext_sig_pdf_td_bartag_qr5 "ext_sig_pdf_td_bartag_qr5"
    (0x5577190dcde0,V-) RooExtendPdf::ext_sig_pdf_td_bartag_qr6 "ext_sig_pdf_td_bartag_qr6"
  Proxies: 
    !plotCoefNormSet -> 
    indexCat -> simPdf_sig_index
    {B0;qr0} -> ext_sig_pdf_td_btag_qr0
    {B0;qr1} -> ext_sig_pdf_td_btag_qr1
    {B0;qr2} -> ext_sig_pdf_td_btag_qr2
    {B0;qr3} -> ext_sig_pdf_td_btag_qr3
    {B0;qr4} -> ext_sig_pdf_td_btag_qr4
    {B0;qr5} -> ext_sig_pdf_td_btag_qr5
    {B0;qr6} -> ext_sig_pdf_td_btag_qr6
    {B0bar;qr0} -> ext_sig_pdf_td_bartag_qr0
    {B0bar;qr1} -> ext_sig_pdf_td_bartag_qr1
    {B0bar;qr2} -> ext_sig_pdf_td_bartag_qr2
    {B0bar;qr3} -> ext_sig_pdf_td_bartag_qr3
    {B0bar;qr4} -> ext_sig_pdf_td_bartag_qr4
    {B0bar;qr5} -> ext_sig_pdf_td_bartag_qr5
    {B0bar;qr6} -> ext_sig_pdf_td_bartag_qr6
--- RooAbsReal ---

  Plot label is "simPdf_sig"
--- RooAbsPdf ---
Cached value = 0

This is the output it shows for the final nested PDF, although it seems fine, I am not sure what the status Dirty means!

I have also tried using RooSuperCategory!

prod_cat = R.RooSuperCategory("prod", "prod", R.RooArgSet(reso_ws.cat("tagflav"), reso_ws.cat("qr")))
reso_ws.Import(prod_cat)

print(prod_cat)

{ {"{B0;qr0}" , 0}, {"{B0;qr1}" , 2}, {"{B0;qr2}" , 4}, {"{B0;qr3}" , 6}, {"{B0;qr4}" , 8}, {"{B0;qr5}" , 10}, {"{B0;qr6}" , 12}, {"{B0bar;qr0}" , 1}, {"{B0bar;qr1}" , 3}, {"{B0bar;qr2}" , 5}, {"{B0bar;qr3}" , 7}, {"{B0bar;qr4}" , 9}, {"{B0bar;qr5}" , 11}, {"{B0bar;qr6}" , 13} }

But when I use

reso_ws.factory("""
SIMUL::simPdf_sig_2(prod,
    {B0;qr0} = ext_sig_pdf_td_btag_qr0,
    {B0;qr1} = ext_sig_pdf_td_btag_qr1,
    {B0;qr2} = ext_sig_pdf_td_btag_qr2,
    {B0;qr3} = ext_sig_pdf_td_btag_qr3,
    {B0;qr4} = ext_sig_pdf_td_btag_qr4,
    {B0;qr5} = ext_sig_pdf_td_btag_qr5,
    {B0;qr6} = ext_sig_pdf_td_btag_qr6,
    {B0bar;qr0} = ext_sig_pdf_td_bartag_qr0,
    {B0bar;qr1} = ext_sig_pdf_td_bartag_qr1,
    {B0bar;qr2} = ext_sig_pdf_td_bartag_qr2,
    {B0bar;qr3} = ext_sig_pdf_td_bartag_qr3,
    {B0bar;qr4} = ext_sig_pdf_td_bartag_qr4,
    {B0bar;qr5} = ext_sig_pdf_td_bartag_qr5,
    {B0bar;qr6} = ext_sig_pdf_td_bartag_qr6
)
""")

or

reso_ws.factory("""
SIMUL::simPdf_sig_2(prod,
    'B0_qr0' = ext_sig_pdf_td_btag_qr0,
    'B0_qr1' = ext_sig_pdf_td_btag_qr1,
    'B0_qr2' = ext_sig_pdf_td_btag_qr2,
    'B0_qr3' = ext_sig_pdf_td_btag_qr3,
    'B0_qr4' = ext_sig_pdf_td_btag_qr4,
    'B0_qr5' = ext_sig_pdf_td_btag_qr5,
    'B0_qr6' = ext_sig_pdf_td_btag_qr6,
    'B0bar_qr0' = ext_sig_pdf_td_bartag_qr0,
    'B0bar_qr1' = ext_sig_pdf_td_bartag_qr1,
    'B0bar_qr2' = ext_sig_pdf_td_bartag_qr2,
    'B0bar_qr3' = ext_sig_pdf_td_bartag_qr3,
    'B0bar_qr4' = ext_sig_pdf_td_bartag_qr4,
    'B0bar_qr5' = ext_sig_pdf_td_bartag_qr5,
    'B0bar_qr6' = ext_sig_pdf_td_bartag_qr6
)
""")

I get

<cppyy.gbl.RooSimultaneous object at 0x55ae4991a330>

But when I try to print this or generate from this, the kernel crashes!


I think I’m starting to get a handle on this.

The reason I don’t get the correct numbers from the nested RooSimultaneous PDFs is that it is simultaneous. Generating events would work if we explicitly give NumEvents=20 (for example), but it won’t work for extended PDFs. Even though the inner PDFs are extended, the simultaneous PDF built from them is not automatically extended.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.