branch.GetEntries("condition")

Hello ROOTers,

I have a TFile, f, from which I can obtain a TTree (called nominal) as
t = f.Get("nominal")
and then see how many entries t has as
t.GetEntries()
or using some condition like
t.GetEntries("n_jet>4").

Now, under t there is a branch called systName. I can obtain it by doing
b = t.GetBranch("systName").
I can do also
b.GetEntries()
and obtain the number of events in that branch. However, I would like to get the number of events under that branch with some condition, like b.GetEntries("EG_SCALE_AF2__1down") — but this doesn’t work (no argument can be given to GetEntries applied on a TBranch).
I would have thought that something like
t.GetEntries("systName=='EG_SCALE_AF2__1down'")
would have worked but it doesn’t.

I have seen some posts suggesting a class creation but this seems pretty roundabout and unclear.

I would appreciate any advice for a straightforward solution.

Cheers,

Roy

What do you get for this branch from: t.Print()

This goes up to Br 152 but just to give the feel of it, here is an example:

There is no “systName” in this list.

Here it is

In C++, try: t.GetEntries("systName==\"EG_SCALE_AF2__1down\"")
and / or: t.GetEntries("systName.EqualTo(\"EG_SCALE_AF2__1down\")")

Works! I was lacking the exact correct syntax. Thanks so much @Wile_E_Coyote!

BTW, is there an alternative syntax which would avoid using backslashes (\)? That would be useful for Python especially which is generally problematic with backslashes.

I guess, in python you could try: t.GetEntries('systName=="EG_SCALE_AF2__1down"')
and / or: t.GetEntries('systName.EqualTo("EG_SCALE_AF2__1down")')

Hi @roy.brener ,

This should work

import ROOT
df = ROOT.RDataFrame("nominal", "YOUR_FILE_NAME_HERE")
# Filter events belonging to the correct systematic
df = df.Filter('systName=="EG_SCALE_AF2_1down"')
# Count how many events pass the filter
events_filtered = df.Count()
# No operation has been run so far, here you trigger it and get the result
print(events_filtered.GetValue())

Check out the RDF docs for a full list of operations you can run.

Thanks a lot @Wile_E_Coyote and @vpadulan! Just for completion, the DF method yields 0 for some reason (also when using a “simpler” filter like df.Filter('n_jet>2'))

Dear @roy.brener ,
I find that extremely strange, what version of ROOT are you using? Does at least

print(ROOT.RDataFrame("nominal", "YOUR_FILE_NAME_HERE").Count().GetValue())

Report the correct number of total entries in the tree?

It does indeed.

Ok, what about

print(ROOT.RDataFrame("nominal", "YOUR_FILE_NAME_HERE").GetColumnNames())

This should print a list of column (branches) available in the tree, do you see all of them, including the systName one used in the Filter above?

Yes, it works and I see systName.

Dear @roy.brener ,

This is a very simple example of filtering a TString column and getting the number of events after the filter. It works on my machine using ROOT 6.26/06

import ROOT
import numpy


def write_tree():
    f = ROOT.TFile.Open("dataset.root", "recreate")
    tree = ROOT.TTree("events", "events")
    njets = numpy.array([0], dtype=numpy.int64)
    systname = ROOT.TString()
    tree.Branch("nJets", njets, "nJets/I")
    tree.Branch("systName", systname)

    for it, name in enumerate(["nominal", "up", "down"]):
        njets[0] = it+1
        systname.Replace(0, systname.Length(), name)
        tree.Fill()

    f.Write()
    f.Close()


def read_tree():
    f = ROOT.TFile.Open("dataset.root", "read")
    tree = f.Get("events")
    tree.Scan("*")


def analyze():
    df = ROOT.RDataFrame("events", "dataset.root")

    nominal_ev = df.Filter('systName=="nominal"').Count()
    down_ev = df.Filter('systName=="down"').Count()
    up_ev = df.Filter('systName=="up"').Count()

    print(f"# nominal events: {nominal_ev.GetValue()}")
    print(f"# down events: {down_ev.GetValue()}")
    print(f"# up events: {up_ev.GetValue()}")


def main():
    write_tree()
    read_tree()
    analyze()


if __name__ == "__main__":
    raise SystemExit(main())

Can you confirm this produces the expected output? i.e.

************************************
*    Row   * nJets.nJe *  systName *
************************************
*        0 *         1 *   nominal *
*        1 *         2 *        up *
*        2 *         3 *      down *
************************************
# nominal events: 1
# down events: 1
# up events: 1

If so, I guess there is something we don’t understand with either your script or your file

Thanks for your elaborate reply! I can confirm the output matches.

Alright! Is your script now also producing the right output? Can you also describe what was the issue? This would be very valuable, thanks!

Hi, thanks. To be fully transparent, here are examples for two things that work and something that doesn’t. I’m not sure why the latter doesn’t.

df = ROOT.RDataFrame(“nominal”,“user.ehaaland.29804058._000001.minitrees.root”)
print(df.Filter(‘n_jet==1’).Count().GetValue())
1784104
print(df.Filter(‘systName==“”’).Count().GetValue())
462803

print(df.Filter(‘systName==“EG_SCALE_AF2_1down”’).Count().GetValue())
0

where 0 is wrong. Maybe it’s something small in the syntax…(?) Maybe there’s a way to print what systName are available for the dF to filter by?

Cheers,

Roy

Dear @roy.brener ,
You can retrieve the values of that column via

import ROOT
df = ROOT.RDataFrame(“nominal”,“user.ehaaland.29804058._000001.minitrees.root”)
# This is going to be a std::vector<TString>
systnames = df.Take[ROOT.TString]('systName').GetValue()

# Convert to Python list of strings for easier printing
print(f"String values in 'systName' column: {[str(val) for val in systnames]}")