Different ways of accessing TTree event content by index in pyROOT

FoxWise · December 13, 2022, 3:42pm

Hi all,
Imagine I would like to get the content of 42nd event from the TTree:

I would distinguish three main approaches:

Method 1: using C++

int secret;
tree->SetBranchAddress("secret", &secret);
tree->GetEntry(42);
std::cout<<secret<<std::endl;

Method 2: C++ copy, but using pyROOT with array package

from array import array
secret = array('i', [0]);
tree.SetBranchAddress("secret", secret)
tree.GetEntry(42)
print(secret[0])

Method 3: Using pyROOT loop

for i, event in enumerate(tree):
    if i == 42:
        print(event.secret)
        break

Question 1:

Is there more pythonic way of extracting data content, without unnecessary loop in method 3 and avoiding array as in method 2?
Something like:

print( tree.GetEventAtIndex(42).secret ) # or even...
print( tree[42].secret )

Question 2:

Can I read somewhere what is actually variable event in the for event in tree ?
As a newbie, I would naively assume it is just an “item” in a “tree” iterable object, but it seems more complicated…

items = [ MyClass() ]
type(items[0]) # obviously MyClass
for event in tree:
    type(event) # not obviously, still TTree: <class cppyy.gbl.TTree at 0x58028e0>

Question 3:

Do I miss some other more aesthetic methods of extracting TTree content into variables than listed in the three methods above? Please share!

cheers,
Bohdan

dastudillo · December 14, 2022, 10:38am

For Q1 you could do, using RDataFrame:

df = ROOT.RDataFrame('tree', 'file.root')
print( df.Filter('rdfentry_==42').Sum('secret').GetValue() )

where rdfentry_ is the entry number (created by default, you don’t need to do it yourself). There may be a better option, but making a sum of 1 element is fast and allows you to select the branch/leaf and use GetValue.
Depending on what else you do, it may be slower or faster than your methods 2 or 3, and if you also have ROOT.EnableImplicitMT(), it may make it faster also depending on what else you do in the code.

eguiraud · December 14, 2022, 11:28am

That code still loops over the full dataset (doing nothing for all entries except the 42nd), so it might not be the fastest

Using a TTree+TEntryList or a RDatasetSpec with the appropriate entry range will only process the desired entry.

dastudillo · December 14, 2022, 11:51am

yes, I noticed but didn’t see another way (I’m still starting to have a look at RDataFrame so don’t know much yet), and I suppose the gains (mainly through multi-threading I guess) will be noticeable when more stuff is done, unlike this very simple example, right? At least when looping over a big TChain to fill a couple of histograms, I saw a big difference between using SetBranchAddress and a dataframe.

eguiraud · December 14, 2022, 2:16pm

To clarify, the two methods I suggested to run over just one entry are to be used with RDataFrame

system · December 28, 2022, 2:16pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.