Adding a branch: pyroot much slower than C++?

pandolf · September 15, 2014, 6:08am

hi all

i want to add a constant branch to a (large) TTree. i’m doing some tests with compiled C++ root vs interpreted pyroot, and it seems to me that the latter proves to be significantly slower than the former. btw i’m using root version 5.34/10

find attached two pieces of code:

addbranch.C should be compiled [*] and run with ./addbranch (from lxplus, so you can read the tree)
addbranch.py should be just run (again from lxplus)

as you can see the two programs are very simple and (as far as my eyes can see) do the exact same things. yet, the compiled C++ version seems to be significantly faster. in particular: the loop over the entries seems to be always ~instantaneous for C++, while (by eye) something like 100k/sec for python. and it becomes even slower if i add more branches to be filled in the same way.

is this expected? am i doing something wrong? is there some trick i should be aware of to achieve the same level of performance between the two?

f

[*] i’m doing:
g++ -Wall -I$ROOTSYS/include -o addbranch addbranch.C $ROOTSYS/bin/root-config --libs --cflags
addbranch.py (767 Bytes)
addbranch.C (868 Bytes)

wlav · September 15, 2014, 9:49pm

Hi,

not really sure what you’re after here: Python is interpreted, so yes, it’ll be much slower than compiled, optimized C++, especially for simple loops. However, the idea when using python is to call C++ functions that do the number crunching. For example, virtually all the time in your script seems to be in the CloneTree() call, whether from python or C++. In fact, I’m writing this and am still waiting for it to finish.

I just restarted that, as I needed to comment it out, so that there’s actually something to compare. W/o the CloneTree(), I find 0.8s for C++, 13.4s for python, and 2.8s for pypy-c (assuming it’s okay with you to replace the numpy array to an array from module). So, 17x slower for full interpretation, 3.5x for JITed code. Yes, that’s significant in this tight loop, but on the whole that’s all irrelevant noise, as the CloneTree() is still running …

Oh, there it is: CloneTree() alone is 3mins 49s. So yeah, that extra 12.6s that python takes is measurable; that extra 2s that pypy takes probably not.

For typical physics analysis codes that call into C++, whether from C++ or Python, Python is not going to be faster, with some exceptional cases where application-level knowledge can be built into the JIT. The general problem is that the JIT can not see into the C++ code, so optimization across those functions is not possible (b/c of unknown side effects). Pure python is a different matter, e.g. devirtualization of calls and better locality due to the tracing JIT. But just last week we’ve finished a paper showing run-time optimization of the communication layer in a large Fortran+C MPI code. So, although things are easier for a JIT in Python, there isn’t necessarily anything to any language itself.

Cheers,
Wim

pandolf · September 16, 2014, 7:03am

hey wim

thanks a lot! any chance i can get my hands on your pypy-c modification?

f

wlav · September 16, 2014, 4:24pm

Hi,

what I used for the timings was this setup: root.cern.ch/drupal/content/pypyroot

In your script, “import numpy” needs to be dropped, and the array should be:import array f1 = array.array('d', [0.])
PyPy’s support for numpy is much further than in the build above, but I haven’t been able to work on PyPy recently (the build is from April). There is some hope that I can find a bit of time next month again.

In general, however, given that PyPy has compatibility issues with many external modules (although that is improving), there’s still little interest in it (take your script: we’re talking about gaining 10s out of 4 minutes, just because all time is already spend in C++).

Cheers,
Wim