Problems adding a RooMappedCategory column to RooDataSet and running reduce

In order to calculate global significances, I’m trying to fit multiple models (different signal mass hypotheses) to a single set of toy datasets generated (by AsymptoticCalculator) using one of these models with signal strength set to 0.

I have some mass hypotheses that channel A is sensitive to, some that channel B is sensitive to, and some that both are sensitive to (where each of these contain two or three sub-channels). I’m trying to deal with this by generating the toy datasets using one of the models that use both channels then selecting the appropriate subset for the other two kinds of models by translating a RooCategory using a RooMappedCategory in RooDataSet::addColumn, then reducing the dataset using RooDataSet::reduce.

I’m running into two problems:

  1. The RooMappedCategory always returns the index corresponding to one particular original category name (“resolved_4b_2018”, or “NotMapped” if I don’t add that category to the map list) regardless of the original category for each “event” (really, histogram bin since this is a HistFactory analysis)
  2. Once I run reduce I lose the weights on the dataset (so in this case, the histogram bin values)

On further investigation, I think the first problem is because RooMappedCategory's _inputCat isn’t being updated for each event (which I think should be done by redirectServers).

Does anyone have any insight into these problems?

Hi @beojan ,
thank you for taking a deeper look. I think we need @moneta or @jonas for this one.

Cheers,
Enrico

Hi @beojan!

Is is possible that you provide us the code and optimally also the data to reproduce your problem? That would greatly help me to answer your question.

Thanks!
Jonas

The code is here:

    sig_type = pattern.match(str(fname)).group(1)
    mass = int(pattern.match(str(fname)).group(2))
    f = ROOT.TFile.Open(str(fname))
    w = f.Get("combWS")
    sb = w.obj("ModelConfig")
    if toy < 0:
        data = w.data("combData")
    else:
        f2 = ROOT.TFile.Open("toys.root")
        w_toys = f2.Get("toys")
        # Need to rename things to match the ModelConfig
        data_orig = w_toys.data(f"toyData_{toy}")
        data_orig.sumEntries()
        print(data_orig.weight(), data_orig.weightError())
        print("data_orig ", data_orig)
        print(f"{data_orig.isWeighted()=}")
        data_orig.Print()
        print("\n".join([varToStr(x) for x in data_orig.get()]))
        obs = sb.GetObservables()
        obs.Print()
        cat_var = "combCat"
        cat_col = obs.selectByName("combCat")
        if cat_col.getSize() != 1:
            cat_col = obs.selectByName("channelCat")
            if cat_col.getSize() != 1:
                print("Error: No category var found in ModelConfig")
                return 0
            cat_var = "channelCat"
        cat = cat_col.first()
        toy_cats = data_orig.get().selectByName("combCat")
        toy_cats.Print()
        toy_cat = toy_cats.first()
        print(varToStr(cat))
        print(varToStr(toy_cat))
        map_cat = ROOT.RooMappedCategory(cat_var, cat_var, toy_cat, "NotMapped", 1000)
        for cat, idx in cat.states():
            print(cat, idx)
            if cat[0] == "r":
                suffix = "resolved"
            else:
                suffix = "boosted"
            map_cat.map(
                f"combCat_{cat}_{suffix}".encode("ascii"), cat.encode("ascii"), idx
            )
        print(varToStr(map_cat))
        print("Mapping cats")
        map_cat.Print()
        data_orig.addColumn(map_cat)
        print("Renaming Variables")
        # # Rename combCat in the dataset so we can reuse that name
        # data_orig.changeObservableName("combCat", "oldCombCat")
        print("New data_orig", data_orig)
        data_orig.Print()
        t = data_orig.GetClonedTree()
        t.Scan()
        # data = data_orig.reduce(obs, f"{cat_var}!={cat_var}::NotMapped")
        data = data_orig.reduce(
            obs
        )  # , f"oldCombCat!=oldCombCat::combCat_resolved_4b_2018_resolved")
        print("New data", data)
        data.Print()
        return 0

Data is unfortunately more of a problem since these are workspaces from an unblinded ATLAS analysis.

EDIT: It’s not really important, but here’s varToStr:

def varToStr(var):
    if isinstance(var, ROOT.RooAbsCategory):
        states = {}
        for k, v in var:
            states[v] = k
        return f"{var.GetName()}: {states[var.getCurrentIndex()]} {states}"
    else:
        bins = var.getBinning()
        return (
            f"{var.GetName()}: {var.getValV()} ({var.getMin()}, {var.getMax()})"
            f" [|{bins.lowBound()}, {bins.highBound()}, {bins.numBins()}|]"
        )

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.