Hello ROOTers,
I have a TFile
, f
, from which I can obtain a TTree
(called nominal
) as
t = f.Get("nominal")
and then see how many entries t
has as
t.GetEntries()
or using some condition like
t.GetEntries("n_jet>4")
.
Now, under t
there is a branch called systName
. I can obtain it by doing
b = t.GetBranch("systName")
.
I can do also
b.GetEntries()
and obtain the number of events in that branch. However, I would like to get the number of events under that branch with some condition, like b.GetEntries("EG_SCALE_AF2__1down")
â but this doesnât work (no argument can be given to GetEntries
applied on a TBranch
).
I would have thought that something like
t.GetEntries("systName=='EG_SCALE_AF2__1down'")
would have worked but it doesnât.
I have seen some posts suggesting a class creation but this seems pretty roundabout and unclear.
I would appreciate any advice for a straightforward solution.
Cheers,
Roy
What do you get for this branch from: t.Print()
This goes up to Br 152
but just to give the feel of it, here is an example:
There is no âsystName
â in this list.
In C++, try: t.GetEntries("systName==\"EG_SCALE_AF2__1down\"")
and / or: t.GetEntries("systName.EqualTo(\"EG_SCALE_AF2__1down\")")
Works! I was lacking the exact correct syntax. Thanks so much @Wile_E_Coyote!
BTW, is there an alternative syntax which would avoid using backslashes (\
)? That would be useful for Python especially which is generally problematic with backslashes.
I guess, in python you could try: t.GetEntries('systName=="EG_SCALE_AF2__1down"')
and / or: t.GetEntries('systName.EqualTo("EG_SCALE_AF2__1down")')
Hi @roy.brener ,
This should work
import ROOT
df = ROOT.RDataFrame("nominal", "YOUR_FILE_NAME_HERE")
# Filter events belonging to the correct systematic
df = df.Filter('systName=="EG_SCALE_AF2_1down"')
# Count how many events pass the filter
events_filtered = df.Count()
# No operation has been run so far, here you trigger it and get the result
print(events_filtered.GetValue())
Check out the RDF docs for a full list of operations you can run.
Thanks a lot @Wile_E_Coyote and @vpadulan! Just for completion, the DF method yields 0 for some reason (also when using a âsimplerâ filter like df.Filter('n_jet>2')
)
Dear @roy.brener ,
I find that extremely strange, what version of ROOT are you using? Does at least
print(ROOT.RDataFrame("nominal", "YOUR_FILE_NAME_HERE").Count().GetValue())
Report the correct number of total entries in the tree?
Ok, what about
print(ROOT.RDataFrame("nominal", "YOUR_FILE_NAME_HERE").GetColumnNames())
This should print a list of column (branches) available in the tree, do you see all of them, including the systName
one used in the Filter above?
Yes, it works and I see systName
.
Dear @roy.brener ,
This is a very simple example of filtering a TString
column and getting the number of events after the filter. It works on my machine using ROOT 6.26/06
import ROOT
import numpy
def write_tree():
f = ROOT.TFile.Open("dataset.root", "recreate")
tree = ROOT.TTree("events", "events")
njets = numpy.array([0], dtype=numpy.int64)
systname = ROOT.TString()
tree.Branch("nJets", njets, "nJets/I")
tree.Branch("systName", systname)
for it, name in enumerate(["nominal", "up", "down"]):
njets[0] = it+1
systname.Replace(0, systname.Length(), name)
tree.Fill()
f.Write()
f.Close()
def read_tree():
f = ROOT.TFile.Open("dataset.root", "read")
tree = f.Get("events")
tree.Scan("*")
def analyze():
df = ROOT.RDataFrame("events", "dataset.root")
nominal_ev = df.Filter('systName=="nominal"').Count()
down_ev = df.Filter('systName=="down"').Count()
up_ev = df.Filter('systName=="up"').Count()
print(f"# nominal events: {nominal_ev.GetValue()}")
print(f"# down events: {down_ev.GetValue()}")
print(f"# up events: {up_ev.GetValue()}")
def main():
write_tree()
read_tree()
analyze()
if __name__ == "__main__":
raise SystemExit(main())
Can you confirm this produces the expected output? i.e.
************************************
* Row * nJets.nJe * systName *
************************************
* 0 * 1 * nominal *
* 1 * 2 * up *
* 2 * 3 * down *
************************************
# nominal events: 1
# down events: 1
# up events: 1
If so, I guess there is something we donât understand with either your script or your file
Thanks for your elaborate reply! I can confirm the output matches.
Alright! Is your script now also producing the right output? Can you also describe what was the issue? This would be very valuable, thanks!
Hi, thanks. To be fully transparent, here are examples for two things that work and something that doesnât. Iâm not sure why the latter doesnât.
df = ROOT.RDataFrame(ânominalâ,âuser.ehaaland.29804058._000001.minitrees.rootâ)
print(df.Filter(ân_jet==1â).Count().GetValue())
1784104
print(df.Filter(âsystName==âââ).Count().GetValue())
462803
print(df.Filter(âsystName==âEG_SCALE_AF2_1downââ).Count().GetValue())
0
where 0
is wrong. Maybe itâs something small in the syntaxâŚ(?) Maybe thereâs a way to print what systName
are available for the dF
to filter by?
Cheers,
Roy
Dear @roy.brener ,
You can retrieve the values of that column via
import ROOT
df = ROOT.RDataFrame(ânominalâ,âuser.ehaaland.29804058._000001.minitrees.rootâ)
# This is going to be a std::vector<TString>
systnames = df.Take[ROOT.TString]('systName').GetValue()
# Convert to Python list of strings for easier printing
print(f"String values in 'systName' column: {[str(val) for val in systnames]}")