Pyroot, loop over TTree very slow compared to .C macro

aleopold · February 23, 2018, 2:49pm

Hi! I have a general question about how to speed up my pyroot scripts (if this is possible). I was developing something a while ago, where I had nested loops over ttrees, and that was incredibly slow, so I switched to a .C macro which did the job quite quick.
Since I wondered if I was doing something wrong, I wrote this script in python and the same in c++, doing not very much, just looping over a ttree in a rootfile (the one I’m using is quite big, 41GB)

python script


import ROOT

path = 'path/to/rootfile.root'
root_file = ROOT.TFile(path, 'read')
tree = root_file.Get("Track")

for event in tree:
  pt = event.track_Pt

root_file.Close()

.C macro:

void loopOver() {

  TString path = "path/to/rootfile.root";
  TFile *root_file = new TFile(path, "read");
  TTree *tree = (TTree*) root_file->Get("Track");
  Double_t trackpt;
  tree->SetBranchAddress("track_Pt", &trackpt);
 
  for (int i=0; i<tree->GetEntries(); i++) {
    tree->GetEntry(i);
    Double_t pt = trackpt;
  }

  root_file->Close();
}

If I time both, the .C one takes 28.422s, the python version takes 6m12.019s

Is there something I can do to speed up my python code?

Thanks for your answers!

etejedor · February 23, 2018, 3:18pm

Hi @aleopold,

It is expected that your loop in C++ runs much faster than the equivalent loop in Python, it is due to a language performance difference.

One way to speed up your Python code when you are reading a tree is to actually hide the loop into C++. This is precisely what the TDataFrame class does, please have a look at:

https://root.cern.ch/doc/v612/classROOT_1_1Experimental_1_1TDataFrame.html

It proposes a declarative approach to process data in ROOT trees. I would be happy to guide you through transforming your tree processing code into a TDataFrame chain of operations, if you are interested.

Cheers,

Enric

behrenhoff · February 23, 2018, 3:20pm

This thread might cover most of the things: Iteration over a tree in pyroot - performance issue

Easiest improvement is to activate only the required branches. Other than that, the liked thread contains a lot of benchmarks.

I would also modernize your C++ code to (make a performance test with this):

void loopOver() {
  TString path = "path/to/rootfile.root";
  TFile *root_file = TFile::Open(path, "read");
  TTreeReader reader("Track", root_file);
  TTreeReaderValue<Double_t> rvTrackPt(reader, "track_Pt");
  while (reader.Next()) {
    Double_t pt = *rvTrackPt;
    // do something with pt
  }
}

wlav · February 23, 2018, 5:14pm

Article: Optimizing python-based ROOT I/O with PyPy’s tracing just-in-time compiler

And the results from the last time I touched it (see slide 17; running at C++ speeds accomplished): ROOT User’s Workshop 2013

If use of Python in HEP ever reaches critical mass, then maybe management can be convinced to invest in such work, so that you can have your cake and eat it, too.

Until then, which version of Python do you use? With recent benchmarking, I found a sweet improvement in p3.6 over p2.7 (even more so for cppyy master).

eguiraud · February 24, 2018, 11:51am

a bit off topic, but

If use of Python in HEP ever reaches critical mass

python usage in HEP is pervasive. I don’t know any PhD student that does not use python at least as much as they use C++.

EDIT:
more on topic, if you have access to ROOT v6.12 (e.g. on lxplus7), you might be interested in timing something like this, in python:

import ROOT
tdf = ROOT.ROOT.Experimental.TDataFrame('Track', 'path/to/rootfile.root')
hist = tdf.Histo1D("track_Pt").GetValue()

The loop is run on a single thread for fairer comparison with your other macros, but with TDataFrame it’s awkward (by design) to just deserialize a variable and do nothing with it, so I’m filling a histogram instead.

aleopold · February 28, 2018, 12:58pm

Thanks a lot for all of your input! I’m going to look at the different possibilities to see what suits my needs best.

Also the TTreeReader side remark will go instantly into my code!

Best,
Alex

system · March 14, 2018, 12:58pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.