PyPyROOT: high performance PyROOT (first beta)

Dear all,

PyPy (http://pypy.org) offers a highly compatible version of the Python 2.7 interpreter and comes with a just-in-time compiler (JIT) that can greatly speed up certain types of computing tasks. In particular, mathematics done in loops, as is common in HEP analysis codes.

Where PyPy breaks compatibility, is in the use of extension libraries, especially when those rely on the internals of the CPython interpreter. PyROOT being on of them, it did so far not work on PyPy.

Now, a first beta release of PyPy supporting PyROOT is available here:

/afs/.cern.ch/sw/lcg/external/pypy

Run the setup script (which sets up the proper CPython, ROOT, and GCC):

/afs/.cern.ch/sw/lcg/external/pypy/x86_64-slc6/setup-pypyroot.sh

which will make the executable ‘pypyroot’ available, which can be used as the normal CPython Python interpreter. For example:

[code] $ pypyroot

import ROOT
c1 = ROOT.TCanvas()

etc.[/code]

I hope to collect feedback over a period of about a month or so, then cut a release 1.0 by CHEP. Please help out by trying it out!

For performance improvements, the “compilation” part of the JIT is only half the story. Far more important are the recognition of use cases by PyPy and specialization of them. One such case is ROOT I/O and TTree reading, for example, runs at C++ speeds in pypyroot (it can be 50x slower in PyROOT). As you can imagine, that is a long (but worthwhile) work in progress.

Other developments by the PyPy team, that are important for the future of high performance python codes, include automatic thread safety using software transactional memory and use of vector instructions in the JIT for numpy code. Likewise, these are a works in progress, but are coming along nicely.

Compatibility with PyROOT is not 100%, and in some ways never will be. For example the fact that PyPy uses a garbage collector rather than reference counting, and that “from ROOT import *” can not work together with the JIT. However, both these can simply be worked around to have code that works fine on both interpreters (e.g. consistently use “import ROOT”) and the list of remaining features to-be-done is rather small and shrinking. (1)

Furthermore, comparing pypyroot today to what PyROOT had in functionality when it was first included in ROOT, back in 2004, it is clear that pypyroot is light years ahead of that version. As such, it is already very useful for any standalone ROOT work. (2)

A list of C++ features supported can be found here (this is documenting the Reflex backend, but it’s the same feature-list for CINT):

http://doc.pypy.org/en/latest/cppyy.html#features

In addition, for the CINT backend, pypyroot has several ROOT-specific pythonizations, such as for TObject, TTree, TString, etc.

When ROOT6 comes out, the CINT backend of pypyroot is ready to be swapped out for a Cling backend. The latter allows for a tighter integration with the JIT (as has been shown with the Reflex backend, which is part of PyPy by default since release 2.0).

Please try it out and give me feedback. Thanks!

Best regards,
Wim Lavrijsen

(1) See: http://root.cern.ch/drupal/content/pypyroot for a short-list
(2) If you use a lot of (C-)extension modules, not bound using rootcint, some of your favorite ones may not (yet) be available for PyPy.

Is there a way for me to try PyPyROOT on my own machine outside of CERN?

I can get PyPy and ROOT from MacPorts, but does ROOT need to be rebuilt to work with PyPy? Can I just try to import ROOT from PyPy naïvely?

Jean-François

Jean-François,

looking at MacPorts, I see their version of PyPy is 2.0.2, so it will have cppyy. The default used, however, is the loadable backend so that pypy does not need to be linked with any C++ libraries. This backend needs to be installed separately, and I only have that for Reflex. The problem with CINT is that certain features are baked into pypy (e.g. TString converters and the RecursiveRemove callback), until I figure out a decent API for adding such functionality at the user-level.

You can, of course, install pypy from source. Details are here:
doc.pypy.org/en/latest/cppyy.html#installation

I recommend doing the ‘hg up reflex-support’, since I have not yet moved the latest changes (virtually all for CINT) to default. To enable the CINT backend, you need to modify two files. First, select the builtin_capi, then select from there cint_capi. Details are here:
doc.pypy.org/en/latest/cppyy_backend.html

All this is going to be cleaned up with ROOT6 and the LLVM backend. :slight_smile:

On lxplus, I also needed to install libffi (the shared library is easiest to deal with, so if you build libffi from source, use --enable-shared).

Beyond that, yes, theoretically ROOT will need to be rebuild as well, but only libPyROOT. However, that should only be needed to get TPython to work, all the rest is fine. But I’m thinking of including that code in the CINT backend directly, since the headers in PyROOT only contain forward declares that should pre-empt the code in libPyROOT and all should work. Haven’t done that yet, though.

Thanks for trying it out!

Cheers,
Wim

I followed the procedure at your first link, including enabling the CINT backend. The translation took 2 hours with MacPort’s CPython (MacPort’s PyPy didn’t work, it gave an error about the pycache files):

jfcaron@jfcaron-MacBook:~/Projects/PyPyRoot/pypy$ pypy rpython/translator/goal/translate.py --opt=jit pypy/goal/targetpypystandalone.py --withmod-cppyy Traceback (most recent call last): File "app_main.py", line 72, in run_toplevel File "rpython/translator/goal/translate.py", line 89, in <module> log = py.log.Producer("translation") File "/Users/jfcaron/Projects/PyPyRoot/pypy/py/_apipkg.py", line 114, in __makeattr result = importobj(modpath, attrname) File "/Users/jfcaron/Projects/PyPyRoot/pypy/py/_apipkg.py", line 37, in importobj module = __import__(modpath, None, None, ['__doc__']) File "/Users/jfcaron/Projects/PyPyRoot/pypy/py/_log/log.py", line 184, in <module> setattr(Syslog, _prio, getattr(py.std.syslog, _prio)) File "/Users/jfcaron/Projects/PyPyRoot/pypy/py/_std.py", line 13, in __getattr__ m = __import__(name) File "/opt/local/lib/pypy/lib_pypy/syslog.py", line 68, in <module> lib = ffi.verify(""" File "/opt/local/lib/pypy/lib_pypy/cffi/api.py", line 311, in verify lib = self.verifier.load_library() File "/opt/local/lib/pypy/lib_pypy/cffi/verifier.py", line 68, in load_library self.compile_module() File "/opt/local/lib/pypy/lib_pypy/cffi/verifier.py", line 55, in compile_module self._write_source() File "/opt/local/lib/pypy/lib_pypy/cffi/verifier.py", line 117, in _write_source file = open(self.sourcefilename, 'w') IOError: [Errno 2] No such file or directory: '/opt/local/lib/pypy/lib_pypy/__pycache__/_cffi__g7019d5d3xad93c709.c'

After translation, I tried the “Basic bindings example” from your link, but it seems to not recognize the MyClass:

>>>> import cppyy
>>>> cppyy.load_reflection_info("libMyClassDict.so")
<CPPLibrary object at 0x0000000106933370>
>>>> myinst = cppyy.gbl.MyClass(42)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: <class '__main__.::'> object has no attribute 'MyClass' (details: '<class '__main__.::'>' has no attribute 'MyClass')
>>>> "MyClass" in dir(cppyy.gbl)
False

A few extra notes:
I had to install gccxml-devel from MacPorts to get genreflex to work.
I had to remove the -rdynamic switch from the g++ call, because apparently that’s on by default on OSX.
I did install libffi from MacPorts, but it’s not clear at what step it is linked in.

So I am still a few steps behind trying PyPyROOT, as cppyy is not working yet. I’d be happy to try any other recommendations to get it working so I can test my PyROOT code. Unfortunately I’m not very good at getting stuff to compile/link, but I can report error messages!

Jean-François

Jean-François,

for the first, I’m not sure, but it might be an access-rights issue to the pycache directory. It is surprising though, that cffi caches are created in the installed pypy version. Maybe Armin or Maciej will have an answer later on the dev list (most folks hang out on IRC though, rather than on dev).

As for the example: yes, the instructions there are for Reflex. With the CINT backend, Reflex is not supported. Theoretically I could mix them natively (rather than through Cintex), but then you’d lose I/O for Reflex classes, which isn’t much of an option either. The CINT backend is for ROOT only (PyCintex.py can be modified to make it work, though, just haven’t done so yet). LLVM/ROOT6 will unify all.

Easiest with CINT is just to use ACLiC. This should work:

[code]>>>> import cppyy

cppyy.gbl.gROOT.LoadMacro(“MyClass.h+”)
myinst = cppyy.gbl.MyClass(42)[/code]
And then all you need is to pick up ROOT.py from /afs/.cern.ch/sw/lcg/external/pypy/x86_64-slc6/python/ROOT.py and that’d be it (although I have not tested this on a Mac yet).

(With the Reflex or the loadable backend, there won’t be any ‘gROOT’ builtin.)

Thanks for trying! :slight_smile:

Cheers,
Wim

Thanks, I didn’t realize that the example for cppyy was Reflex only and not CINT. I tried your example instead which compiles MyClass.h with ACLiC, and it works. I copied the ROOT.py from your afs address, and the basic test of creating a TH1F & Drawing also works.

Perhaps tomorrow I will try to run some of my analysis code with pypy-c.

Jean-François

It somewhat works, but has weird failures! Read on for the whole story.

The times here are all done using the bash built-in “time” on my fast MacBook. In reality the main processing was done on a cluster whose individual processors were much slower. The code doesn’t do any parallel processing, but I ran hundreds of independent data chunks simultaneously. All the python code avoids using NumPy because getting it set up properly on the cluster was painful, and I only really needed contiguous arrays for ROOT interoptability, so I mostly used the builtin Python arrays and my own array arithmetic extension.

For the project that I just finished, my analysis went in several stages. The first is a C+±only compiled ROOT program, so that’s fixed (and takes ~tens of minutes to run per data chunk).

The second is a Python-only (using PyROOT) program. Here are the results:
CPython: ~2m45s
PyPy: ~1m0s

The third stage is a shorter all-Python script. With CPython it runs in ~14s, but with PyPyROOT it fails when I try to make a TF1. Here is the error message:

Traceback (most recent call last): File "app_main.py", line 72, in run_toplevel File "calculate_R.py", line 180, in <module> pois = ROOT.TF1("pois",Poisson(),clulims[0],clulims[1],2) TypeError: none of the 8 overloaded methods succeeded. Full details: TF1::TF1() => TypeError: wrong number of arguments TF1::TF1(const TF1&) => TypeError: wrong number of arguments TF1::TF1(const char*, const char*, Double_t, Double_t) => TypeError: wrong number of arguments TF1::TF1(const char*, Double_t, Double_t, Int_t) => TypeError: wrong number of arguments TF1::TF1(const char*, ROOT::Math::ParamFunctor, Double_t, Double_t, Int_t) => TypeError: cannot pass instance as ParamFunctor TF1::TF1(const char*, void*, Double_t, Double_t, Int_t) => TypeError: 'CPPInstance' object expected, got 'instance' instead TF1::TF1(const char*, void*, Double_t, Double_t, Int_t, const char*) => TypeError: wrong number of arguments TF1::TF1(const char*, void*, void*, Double_t, Double_t, Int_t, const char*, const char*) => TypeError: wrong number of arguments
The code is here http://bazaar.launchpad.net/~jfcaron/+junk/TRIUMFBeamTest/view/head:/cluster_analysis/calculate_R.py if you wish to look at how I am using the TF1. The Poisson() call creates a Python functor object that internally calls ROOT::TMath stuff.

The third script makes some plots (all with ROOT-based stuff like TCanvases and TGraphs). With CPython it runs in ~3s, but PyPy again crashes with this error message:

Traceback (most recent call last): File "app_main.py", line 72, in run_toplevel File "FOMplots.py", line 1444, in <module> FOMplots(sys.argv) File "FOMplots.py", line 283, in FOMplots rescaleaxis(traces[c][i],sample_width/1e-9) File "FOMplots.py", line 98, in rescaleaxis g.SetHistogram(0) TypeError: cannot pass int as TH1F
Again the code responsible for the crash is here: http://bazaar.launchpad.net/~jfcaron/+junk/TRIUMFBeamTest/view/head:/cluster_analysis/FOMplots.py It happens in a function that rescales a TGraph, and that code works fine in CPython.

So overall, I am happy that the first python stage works with PyPy, and magically runs faster. The other two show mysterious crashes in code that otherwise works in CPython. I should note that I had already put in some effort to optimize the first python analysis stage. For example it uses tons of memory to cache results rather than recompute them, so if one of PyPy’s magic tricks is to speed up native python calculations, my caching would reduce the visible benefit from PyPy.

As before, I am willing to try modifications to get PyPy working for all the stages (or to track down the problem if it’s something in PyPy(ROOT)). I am very excited by the prospect of using PyPyROOT for my current project (which is still in its infant-C++ stage).

Jean-François

Jean-François.

cool, thanks for the feedback!

The exceptions occur for the TF1 b/c I have yet to write the TF1/2/3 callback implementations and pythonizations. The second because the code does not allow the integer ‘0’ to pass through a pointer. I just need to write/add those. The latter is trivial to fix, the former is more work. I’ll get to it.

As for caching … it depends: PyPy does not so much memoize results, but rather elides them. This only works if the compute-heavy function call is completely side-effect free, and there are only a limited number of different inputs within a compiled trace (otherwise there is the risk of an explosion of combinatorics) or the inputs to the function are known to be constant over the scope of the trace. Those are hard requirements to meet.

Of course, if the cached function is relatively simple and can be inlined within a trace, then that may very well be faster than the lookup in the cache.

Thanks again,
Wim

Jean-François,

back to this one. :slight_smile:

So I have working TF1 callback. Still, I’m seeing a crash when using it for a fit, but only post-translation. What I see in gdb seems simple to fix (NULL-check), but isn’t for today anymore. I realize that a fit is what you need, not just the callback …

The other TODO left is that errors are currently silently absorbed, but that is no more than annoying.

Performance seems fine (tested on plotting, not fitting), as I’m able to bring the callback first back to the interpreter, only then call the user function. Meaning, the user function is open for JIT-ing, and the penalty is ‘only’ in the song-and-dance through CINT. I do not know however if the code will warm up if the loop is in C++ (as is the case when doing a Fit). I would expect not, so that would require some JIT hints.

The other problem, passing an int 0 through a pointer, is fixed.

Code has been pushed on the reflex-support branch, so it can be tried out if your are adventurous, but I want to fix those (post-translation) errors and a few other feedback items before rebuilding on lxplus.

Thanks,
Wim

Hi,

so that was decidedly not just a missing NULL-check. :cry:

Anyway, a full rewrite later … code is in the repository.

Error checking is still only rudimentary (but that’s better than none). There’s a layer left that causes some slowdown and indirection, but the overall code is faster than what I had before (which was, as stated, already 2x faster than PyROOT). And, contrary to PyROOT, the python function is not leaked: its lifetime is bound to the lifetime of the TF1 instance.

I’m going to work on the other requests and then get this onto /afs. Further optimization (of the indirection layer) is for later (after CHEP).

Cheers,
Wim

Hi Wim,

there is definitively an error, but I’m not sure why.
Maybe you can take a look…

I set pypy up with the setup script in the directory stated above.
Then I removed the from ROOT import statements and now use only import ROOT.

The program crashes immediately and returns:

#6 0x00000000015a5dbe in pypy_debug_catch_fatal_exception () #7 0x0000000000419395 in pypy_g_ccall_pypy_debug_catch_fatal_exception___ () #8 0x0000000000b1f3eb in PyObject_IsInstance () #9 0x00007f86db81e60a in warn_explicit (category=0x1f74ba0, message=0x7f86e7c3d0d0, filename=0x7f86e7c3d0f0, lineno=0, module=0x7f86e7c3d110, registry=0x0, sourceline=0x0) at /build/vdiez/Python-2.7.3/Python/_warnings.c:314 #10 0x00007f86db81fbe0 in PyErr_WarnExplicit (category=0x1f74ba0, text=Unhandled dwarf expression opcode 0xf3 #11 0x00007f86ed563b7a in ErrorHandler () from /afs/cern.ch/sw/lcg/app/releases/ROOT/5.34.10/x86_64-slc6-gcc47-opt/root/lib/libCore.so #12 0x00007f86ed563f72 in Warning(char const*, char const*, ...) () from /afs/cern.ch/sw/lcg/app/releases/ROOT/5.34.10/x86_64-slc6-gcc47-opt/root/lib/libCore.so #13 0x00007f86ed5f38bd in TClass::Init(char const*, short, std::type_info const*, TVirtualIsAProxy*, void (*)(void*, TMemberInspector&), char const*, char const*, int, int, bool) () from /afs/cern.ch/sw/lcg/app/releases/ROOT/5.34.10/x86_64-slc6-gcc47-opt/root/lib/libCore.so #14 0x00007f86ed5f4358 in TClass::TClass(char const*, short, char const*, char const*, int, int, bool) () from /afs/cern.ch/sw/lcg/app/releases/ROOT/5.34.10/x86_64-slc6-gcc47-opt/root/lib/libCore.so #15 0x00007f86e0a64e66 in TStreamerInfo::BuildCheck() () from /afs/cern.ch/sw/lcg/app/releases/ROOT/5.34.10/x86_64-slc6-gcc47-opt/root/lib/libRIO.so #16 0x00007f86e09add41 in TFile::ReadStreamerInfo() () from /afs/cern.ch/sw/lcg/app/releases/ROOT/5.34.10/x86_64-slc6-gcc47-opt/root/lib/libRIO.so #17 0x00007f86e09b060b in TFile::Init(bool) () from /afs/cern.ch/sw/lcg/app/releases/ROOT/5.34.10/x86_64-slc6-gcc47-opt/root/lib/libRIO.so #18 0x00007f86d8b0513d in TDCacheFile::TDCacheFile(char const*, char const*, char const*, int) () from /afs/cern.ch/sw/lcg/app/releases/ROOT/5.34.10/x86_64-slc6-gcc47-opt/root/lib/libDCache.s

If you want, I can send you the python script.

Cheers,
Manuel

Manuel,

this line I do not understand:/build/vdiez/Python-2.7.3/Python/_warnings.cas pypy-c has its own builtin warnings, and thus should not be using anything from libpython. Might come in through linkage of libPyROOT.so, but that shouldn’t be loaded (could though, through it’s dictionary).

I also don’t understand:TDCacheFile::TDCacheFile(... as just doing:import ROOTshould not even load ROOT, let alone creating TDCacheFiles?

What exactly did you run?

Cheers,
WIm

Hi,

Are there any release updates? Is there a blog I should watch? This is awesome. I would like to use pypyroot for everything, everywhere.

Alex

Alex,

the code on /afs has been updated since (fixes such as the use of TF1 are in, for example, as well as several issues that came up while consolidating the unit tests between PyPy and PyROOT), but I’ve yet to bake a release. Lack of time …

I’m hoping to run the ATLAS xAOD tutorial with it. That’s planned for end of July.

In the mean time, please use the installation referenced from http://root.cern.ch/drupal/content/pypyroot (last updated Apr. 24) and report any problems that you may have with it.

Thanks,
Wim

Hi Wim,

I see some unexpected behavior while trying to build a TChain with pypyroot. Reading the first file of the chain (200 events) looks fine, but reading the second file seems to behave poorly. I put a example script (12 lines) in my public afs area.

cd /afs/cern.ch/user/t/tuna/public/pypyroot/
source /afs/.cern.ch/sw/lcg/external/pypy/x86_64-slc6/setup-pypyroot.sh
pypyroot chain.py # > output.txt

Do you have any suggestions?