Different ways of accessing TTree event content by index in pyROOT

Hi all,
Imagine I would like to get the content of 42nd event from the TTree:

I would distinguish three main approaches:

Method 1: using C++

int secret;
tree->SetBranchAddress("secret", &secret);

Method 2: C++ copy, but using pyROOT with array package

from array import array
secret = array('i', [0]);
tree.SetBranchAddress("secret", secret)

Method 3: Using pyROOT loop

for i, event in enumerate(tree):
    if i == 42:

Question 1:

Is there more pythonic way of extracting data content, without unnecessary loop in method 3 and avoiding array as in method 2?
Something like:

print( tree.GetEventAtIndex(42).secret ) # or even...
print( tree[42].secret )

Question 2:

Can I read somewhere what is actually variable event in the for event in tree ?
As a newbie, I would naively assume it is just an “item” in a “tree” iterable object, but it seems more complicated…

items = [ MyClass() ]
type(items[0]) # obviously MyClass
for event in tree:
    type(event) # not obviously, still TTree: <class cppyy.gbl.TTree at 0x58028e0>

Question 3:

Do I miss some other more aesthetic methods of extracting TTree content into variables than listed in the three methods above? Please share!


For Q1 you could do, using RDataFrame:

df = ROOT.RDataFrame('tree', 'file.root')
print( df.Filter('rdfentry_==42').Sum('secret').GetValue() )

where rdfentry_ is the entry number (created by default, you don’t need to do it yourself). There may be a better option, but making a sum of 1 element is fast and allows you to select the branch/leaf and use GetValue.
Depending on what else you do, it may be slower or faster than your methods 2 or 3, and if you also have ROOT.EnableImplicitMT(), it may make it faster also depending on what else you do in the code.

That code still loops over the full dataset (doing nothing for all entries except the 42nd), so it might not be the fastest :sweat_smile:

Using a TTree+TEntryList or a RDatasetSpec with the appropriate entry range will only process the desired entry.

yes, I noticed but didn’t see another way (I’m still starting to have a look at RDataFrame :slight_smile: so don’t know much yet), and I suppose the gains (mainly through multi-threading I guess) will be noticeable when more stuff is done, unlike this very simple example, right? At least when looping over a big TChain to fill a couple of histograms, I saw a big difference between using SetBranchAddress and a dataframe.

To clarify, the two methods I suggested to run over just one entry are to be used with RDataFrame

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.