How to start PyPy with ROOT module

Hi all!

I found PyPy very attractive and would like to use it to improve performance with TTree reading loops in ROOT.

for event in tree:
    pt = event.pt

I am using Scientific linux and I have a ROOT 6.14/06 compiled with python 3.7.2.

I have installed PyPy from portable binary. But it works with python 3.5.3 and doesn’t see ROOT module.

I have tried to set env variables in bashrc PYTHONPATH, PATHONSTARTUP but it shows weird error. And does not seem to work.

And anyway, as far as I understand PyPy requires its own module versions, not python ones.

I saw some installation instruction for machines at CERN. But I would like to use it on my personal one. Is it possible? Is there any clear step-by-step instructions or will they appear?
How is PyPy project doing in 2019?

Thanks everybody for your time!
Have a nice day,
Bohdan


ROOT Version: 6.14/06
Platform: Scientific Linux 7
Compiler: gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36)
Python Version: 3.7.2


1 Like

Hi @FoxWise

In PyROOT we mainly support CPython. Although there is some code in PyROOT to deal with the PyPy case, I am unsure of how/if this code works with newer versions of Python.

Anyway, the error you report already appears when starting up PyPy, right? Could you look for help on some PyPy forum about that particular issue?

Cheers,

Enric

PyPy supports CPython extension modules through an emulation of the C-API (cpyext). It’s dog slow, though, sometimes even slow enough that python is overall faster. For PyPy to read trees fast (at C++ speeds), it needs to know how to unpack them, see:

Slides from '13 Users Workshop
Corresponding write-up

The above is all CINT-based; it would need revival and porting to Cling. It was based on cppyy, a rewrite of PyROOT internals attuned to PyPy, which is now a fully supported project on its own, but no longer compatible with PyROOT as-is.

As for your specific error, you seem to have pointed PyPy to Python internal modules. There’s no reason to do that. You should rebuild PyROOT for Python 3.5 and then just use it as-is. (Whether it works, I don’t know, I’ve never tried.)

Thanks for the answers!
I saw both
Slides from '13 Users Workshop
Corresponding write-up

But as far as i can understand, there presented: slow python example and fast c++ example.
Which does not tell me how to write it to be fast in python.

And I looked at many threads on this topic, like:
Iteration over a tree in pyroot - performance issue
Pyroot, loop over TTree very slow compared to .C macro

But I still don’t understand how to improve reading TTree data…

I have tried one of @wlav replies:

x = array(‘d’,[0])
tree.SetBranchAddress( “x”, x )

alist, i = [], 0
while tree.GetEntry(i):
    i += 1
    alist.append( x[0] )

But I am getting the error:

SystemError: int TTree::GetEntry(Long64_t entry = 0, int getall = 0) =>
    problem in C++; program state has been reset

There is thread about this error, but it related to TCloneArray. And he (maybe) solved this by
adding x.Clear(“C”) at the start of the loop. But there is no such method for this kind of array.
Is there a way to solve this?

Thanks

Hi,

Let’s come back to where we started:

How many hours did you spend on this now - still less than what you might gain in acceleration through PyPy?

If your CPython code is really that slow that you suffer from its performance bottleneck, maybe you could consider using a compiled language and invest the time in the code conversion?

I’m all for exploring technologies - but watch out, Wim told you it won’t be cheap and simple…

Cheers, Axel.

Hi, Axel!

I failed to launch my script with PyPy. So I didn’t spend much time on it.

My code is not that slow to try some hardcore code conversion thingies. And I am not that good at programming, so it definitely will not be time efficient.

I just wonder is there another more simple way to speed up TTree reading? As on my reply above, from Wim’s answer. But it shows the error (see above).

Thanks for your time, anyway.

cheers, Bohdan

You could check out RDataFrame in C++ - that’s reasonably simple (as simple as Python - it’s fairly language agnostic) and super fast. See the dataframe tutorials in any recent ROOT release!

As said, the slowness isn’t in python, but in the interface layer. As such, you can’t fix it in python and you can’t fix it by replacing the CPython interpreter by the PyPy one, as explained. As the paper describes, the fast python code on PyPy was achieved using a custom iterator for TTree that was transparent to the JIT. The code is publicly available, but sits somewhere deep in the history of the PyPy repo, as it was for CINT and with CINT gone, it has since been deleted.

Of course, if one person were to fix the interface layer (or if it were replaced by a different technology), it would be fixed for all PyROOT users, so such effort would nicely amortize. E.g. Scott Snyder provided long time ago a patch for ATLAS users (but that never made it upstream, see here). Right now, the convenient lookups are through a re-implementation of python’s getattr. That alone is expensive, but then PyROOT’s TTreeGetAttr layer does another huge amount of work (with no memoization), calling many TTree methods along the way.

Generating a custom type per TTree with lazy lookup and reference tracking should give you an easy 10x improvement, and beat out Scott’s code which e.g. still creates/deletes converters. (I did something similar not too long ago for builtin arrays in cppyy, and got a 12x speedup.) But again, that’s a small C++ project and will take a couple of weeks, not something you can tweak in python.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.