How to study yield variation using RooMCStudy

I would like to study how well my fit model estimates signal yields. RooMCStudy seems like the obvious choice, but there does not seem to be a way to extract the signal yield used to generate the toy datasets. Details below.


Introducing a Poisson fluctuation on the number of events using Extended changes the event weights, but the stored values of the pdf yields do not change, making it impossible to determine the yield assigned to each pdf during dataset generation.

import ROOT as r

ws = r.RooWorkspace("workspace")

# declare model
x = ws.factory("x[-10, 10]")
pdf1 = ws.factory("Gaussian::pdf1(x, m[-1, 1], s1[3,5])")
pdf2 = ws.factory("Gaussian::pdf2(x, m, s2[5,7])")
pdf = ws.factory("SUM::pdf(N1[0, 100] * pdf1, N2[0, 100] * pdf2)")

# add constraint so we can access genParDataSet
s1 = ws.var("s1")
constraint1 = ws.factory("Gaussian::constraint1(s1, cm1[4], cs1[1])")
prodpdf = ws.factory("PROD::prodpdf({pdf,constraint1})")

# declare mcstudy
mcstudy = r.RooMCStudy(
    prodpdf,
    r.RooArgSet(x),
    r.RooFit.FitModel(prodpdf),
    r.RooFit.Silence(),
    r.RooFit.Binned(),  # speed up fits
    r.RooFit.Extended(),  # Poisson fluctuation on number of events
    r.RooFit.Constrain(s1),
    r.RooFit.FitOptions(
        r.RooFit.Extended(),
        r.RooFit.Save(),
        r.RooFit.SumW2Error(True),
    ),
)

# perform 1000 tests
mcstudy.generateAndFit(1000, 0, True)

# the Poisson fluctuation only changes the overall event *weights*
assert mcstudy.genData(0).numEntries() == mcstudy.genData(1).numEntries()
assert mcstudy.genData(0).sumEntries() != mcstudy.genData(1).sumEntries()
# generated pdf1 yields are always the same,
# despite Poisson fluctuation on events
gen_ds = mcstudy.genParDataSet()
assert all(
    gen_ds.get(i)["N1_gen"].getVal() == gen_ds.get(0)["N1_gen"].getVal()
    for i in range(gen_ds.numEntries())
)
# therefore, the pulls are meaningless
# since they compare the fitted value of N1 to the generated value N1_gen,
# which is not the same as the generated yield for pdf1

# strangely, though N1_gen is fixed to the initial value of N1,
# the stored value of N1 has changed;
# this happens with or without a Poisson fluctuation,
# suggesting the values of N1 and N2 are changing
# without being stored in genParDataSet
assert gen_ds.get(0)["N1_gen"].getVal() == 50
assert ws.var("N1").getVal() != 50

I think @moneta or @jonas should be able to answer when back from the Holiday’s break

Hello @mwilkins, interesting usecase for the RooMCStudy!

The Extended() option for the RooMCStudy only adds a Poisson fluctuation to the final weight sum. So it is straight forward to determine the yield during dataset generation: the fraction of pdf1 is still N1/(N1+N2) = 0.5 in your example, because the Poisson fluctuation is only added at the end, not to the signal and background components.

But that would be too easy, so I guess it’s not what you want. You want to introduce Poisson fluctuations to the background and signal yields separately – tracking the values used for the generation.

This can be done by creating Poisson constraint terms for the yield parameters, and then adding the yield Parameters to the Constrain() parameter of the RooMCStudy. That’s the correct way to vary the yield parameter for each toy.

The code would look like this:

import ROOT as r

ws = r.RooWorkspace("workspace")

# declare model
x = ws.factory("x[-10, 10]")
pdf1 = ws.factory("Gaussian::pdf1(x, m[-1, 1], s1[3,5])")
pdf2 = ws.factory("Gaussian::pdf2(x, m, s2[5,7])")
pdf = ws.factory("SUM::pdf(N1[0, 100] * pdf1, N2[0, 100] * pdf2)")

# add constraint so we can access genParDataSet
s1 = ws.var("s1")
N1 = ws.var("N1")
N2 = ws.var("N2")
constraint1 = ws.factory("Gaussian::constraint1(s1, cm1[4], cs1[1])")
constraintN1 = ws.factory("Poisson::constraint2(N1, N1_mu[50])")
constraintN2 = ws.factory("Poisson::constraintN1(N1, N1_mu[50])")
constraintN2 = ws.factory("Poisson::constraintN2(N2, N2_mu[50])")
prodpdf = ws.factory("PROD::prodpdf({pdf,constraint1,constraintN1,constraintN2})")

# declare mcstudy
mcstudy = r.RooMCStudy(
    prodpdf,
    r.RooArgSet(x),
    r.RooFit.FitModel(prodpdf),
    r.RooFit.Silence(),
    r.RooFit.Binned(),  # speed up fits
    # Poisson fluctuation on number of events, not needed because we have
    # already the Extended() fit option below, and the poisson fluctuation in
    # the generated events comes from the constrant terms.
    # r.RooFit.Extended(),
    r.RooFit.Constrain(r.RooArgSet(s1,N1,N2)),
    r.RooFit.FitOptions(
        r.RooFit.Extended(),
        r.RooFit.Save(),
        r.RooFit.SumW2Error(True),
    ),
)

# perform 1000 tests
mcstudy.generateAndFit(10, 0, True)

# print tome info on the generated datasets and parameters
gen_ds = mcstudy.genParDataSet()
for i in range(gen_ds.numEntries()):
    print(mcstudy.genData(i).numEntries(), mcstudy.genData(i).sumEntries())
    gen_ds.get(i).Print("v")
    print("")

I hope this works for you, let me know if you have any further questions!

Cheers,
Jonas

Hi, @jonas,

A Poisson fluctuation on the total yield would be fine with me; I failed to appreciate that I could work out the component yields from their fractions. Such a solution would undermine the utility of RooMCStudy, however, because features like the pull distributions are wrong, since the value of the yield parameter no longer represents the actual yield of the sub-shape. It may be worth pointing out these subtleties in the documentation, as this is not very intuitive from a user perspective.

Adding a Poisson fluctuation to each sub-shape is also acceptable; the important thing is that I be able to access the true, generated yield value for each study and compare it to the fit value. Here is a streamlined demonstrator for posterity:

import ROOT as r

ws = r.RooWorkspace("workspace")

# declare model
x = ws.factory("x[-10, 10]")
pdf1 = ws.factory("Gaussian::pdf1(x, m[-1, 1], s1[3,5])")
pdf2 = ws.factory("Gaussian::pdf2(x, m, s2[5,7])")
pdf = ws.factory("SUM::pdf(N1[0, 100] * pdf1, N2[0, 100] * pdf2)")

# add constraints
N1 = ws.var("N1")
N2 = ws.var("N2")
constraintN1 = ws.factory("Poisson::constraintN1(N1, N1_mu[50])")
constraintN2 = ws.factory("Poisson::constraintN2(N2, N2_mu[50])")
prodpdf = ws.factory("PROD::prodpdf({pdf,constraintN1,constraintN2})")

# declare mcstudy
mcstudy = r.RooMCStudy(
    prodpdf,
    r.RooArgSet(x),
    r.RooFit.FitModel(prodpdf),
    r.RooFit.Silence(),
    r.RooFit.Binned(),  # speed up fits
    # Poisson fluctuation on number of events, not needed because we have
    # already the Extended() fit option below, and the poisson fluctuation in
    # the generated events comes from the constrant terms.
    # r.RooFit.Extended(),
    r.RooFit.Constrain(r.RooArgSet(N1, N2)),
    r.RooFit.FitOptions(
        r.RooFit.Extended(),
        r.RooFit.Save(),
        r.RooFit.SumW2Error(True),
    ),
)

# perform tests
mcstudy.generateAndFit(1000, 0, True)

# check yields
gen_ds = mcstudy.genParDataSet()
assert all(
    gen_ds.get(i)["N1_gen"].getVal() + gen_ds.get(i)["N2_gen"].getVal()
    == mcstudy.genData(i).sumEntries()
    for i in range(gen_ds.numEntries())
)

Thanks for your help.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.