PyROOT and Multithreading

Hi all,

I want to do something a little convoluted here, so it may very well not be possible, but it doesn’t hurt to ask :wink:

I have a C++ ROOT class, which I compile and load using PyROOT. Since the “process” method of the class is very CPU-intensive, I wanted to run multithreading to take advantage of the 8-core machines at my institute. I am aware of the GIL problem in python, which makes multithreading with high CPU-usage a little difficult with python classes. However, when using C modules one can be careful and release the GIL and perform the computation, allowing for intensive CPU multithreading.

As far as my tries go, this approachh does not work out-of-the-box with PyROOT, which is the one who should release the GIL whene executing calls. Is there any way to do so? Has anybody tried something similar? Is it possible and I am just doing things wrong?

Thanks a lot,
Albert

Albert,

no, the GIL is not released by PyROOT. I’ve been thinking of making such possible on a method-by-method basis. Are you using trunk? If so, I could provide a prototype to test.

Cheers,
Wim

Hi Wim,

yes I am using trunk, maybe we could work something out. I think it would be very nice for PyROOT to be able to provide such a functionality, since it is a commen use case to do number-crunching in C++ and job steering in python.

I currently use a multiprocess approach, but it is slower (create and destroy) processes, consumes more memory and generally less flexible.

Let me know if I can be of further assistance

Cheers,
Albert

Albert,

we actually like the multiprocessing approach: memory can be shared if the other processes are setup late (e.g. after initialization of all common elements such as conditions data).

I’ve thrown in a prototype. This should work except that the GIL is released a tad early and the code is repetitive which isn’t very nice. I’ll improve on that tomorrow.

Each member function now has a _threaded pseudo-variable that can be set to True to make it release the GIL upon each call. E.g. (bad idea):

Cheers,
Wim

Nice feature, something I have been after particularly for a webserver environment. However, it crashes for me on lxplus5.

Steps to reproduce (with an otherwise clean environment):

pwaller@lxplus313 ~ $ . /afs/cern.ch/sw/lcg/contrib/gcc/4.3/x86_64-slc5/setup.sh
pwaller@lxplus313 ~ $ . /afs/cern.ch/sw/lcg/app/releases/ROOT/5.27.06/x86_64-slc5-gcc43-opt/root/bin/thisroot.sh
pwaller@lxplus313 ~ $ python2.6 -c “import ROOT as R; R.gROOT.GetVersion._threaded = 1; print R.gROOT.GetVersion()”

*** Break *** segmentation violation

===========================================================
There was a crash.
This is the entire stack trace of all threads:

Thread 2 (Thread 0x42243940 (LWP 28263)):
#0 0x00000030b42ccfc2 in select () from /lib64/libc.so.6
#1 0x00002b17c2e2742f in ?? () from /usr/lib64/python2.6/lib-dynload/timemodule.so
#2 0x00000030b36d99a1 in PyEval_EvalFrameEx () from /usr/lib64/libpython2.6.so.1.0
#3 0x00000030b36da5a1 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.6.so.1.0
#4 0x00000030b366d2bc in ?? () from /usr/lib64/libpython2.6.so.1.0
#5 0x00000030b3643ac8 in PyObject_Call () from /usr/lib64/libpython2.6.so.1.0
#6 0x00000030b36d5382 in PyEval_EvalFrameEx () from /usr/lib64/libpython2.6.so.1.0
#7 0x00000030b36d9714 in PyEval_EvalFrameEx () from /usr/lib64/libpython2.6.so.1.0
#8 0x00000030b36d9714 in PyEval_EvalFrameEx () from /usr/lib64/libpython2.6.so.1.0
#9 0x00000030b36da5a1 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.6.so.1.0
#10 0x00000030b366d1bd in ?? () from /usr/lib64/libpython2.6.so.1.0
#11 0x00000030b3643ac8 in PyObject_Call () from /usr/lib64/libpython2.6.so.1.0
#12 0x00000030b36545bd in ?? () from /usr/lib64/libpython2.6.so.1.0
#13 0x00000030b3643ac8 in PyObject_Call () from /usr/lib64/libpython2.6.so.1.0
#14 0x00000030b36d3826 in PyEval_CallObjectWithKeywords () from /usr/lib64/libpython2.6.so.1.0
#15 0x00000030b37045cd in ?? () from /usr/lib64/libpython2.6.so.1.0
#16 0x00000030b4e0673d in start_thread () from /lib64/libpthread.so.0
#17 0x00000030b42d3d1d in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x2b17b84bcf90 (LWP 28256)):
#0 0x00000030b4299daf in waitpid () from /lib64/libc.so.6
#1 0x00000030b423c331 in do_system () from /lib64/libc.so.6
#2 0x00000030b423c687 in system () from /lib64/libc.so.6
#3 0x00002b17bc4aaa02 in TUnixSystem::StackTrace() ()
from /afs/cern.ch/sw/lcg/app/releases/ROOT/5.27.06/x86_64-slc5-gcc43-opt/root/lib/libCore.so
#4 0x00002b17bc4ab4cc in TUnixSystem::DispatchSignals(ESignals) ()
from /afs/cern.ch/sw/lcg/app/releases/ROOT/5.27.06/x86_64-slc5-gcc43-opt/root/lib/libCore.so
#5
#6 0x00000030b36e30fa in PyErr_Occurred () from /usr/lib64/libpython2.6.so.1.0
#7 0x00002b17bc001065 in PyROOT::TMethodHolder<PyROOT::TScopeAdapter, PyROOT::TMemberAdapter>::Execute(void*) ()
from /afs/cern.ch/sw/lcg/app/releases/ROOT/5.27.06/x86_64-slc5-gcc43-opt/root/lib/libPyROOT.so
#8 0x00002b17bbfffcf0 in PyROOT::TMethodHolder<PyROOT::TScopeAdapter, PyROOT::TMemberAdapter>::operator()(PyROOT::ObjectProxy*, _object*, _object*, long) () from /afs/cern.ch/sw/lcg/app/releases/ROOT/5.27.06/x86_64-slc5-gcc43-opt/root/lib/libPyROOT.so
#9 0x00002b17bc006b75 in PyROOT::(anonymous namespace)::mp_call(PyROOT::MethodProxy*, _object*, _object*) ()
from /afs/cern.ch/sw/lcg/app/releases/ROOT/5.27.06/x86_64-slc5-gcc43-opt/root/lib/libPyROOT.so
#10 0x00000030b3643ac8 in PyObject_Call () from /usr/lib64/libpython2.6.so.1.0
#11 0x00000030b36d5733 in PyEval_EvalFrameEx () from /usr/lib64/libpython2.6.so.1.0
#12 0x00000030b36da5a1 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.6.so.1.0
#13 0x00000030b36da622 in PyEval_EvalCode () from /usr/lib64/libpython2.6.so.1.0
#14 0x00000030b36f4f92 in ?? () from /usr/lib64/libpython2.6.so.1.0
#15 0x00000030b36f518a in PyRun_StringFlags () from /usr/lib64/libpython2.6.so.1.0
#16 0x00000030b36f6310 in PyRun_SimpleStringFlags () from /usr/lib64/libpython2.6.so.1.0
#17 0x00000030b3701f95 in Py_Main () from /usr/lib64/libpython2.6.so.1.0
#18 0x00000030b421d994 in __libc_start_main () from /lib64/libc.so.6
#19 0x0000000000400629 in _start ()

The lines below might hint at the cause of the crash.
If they do not help you then please submit a bug report at
root.cern.ch/bugs. Please post the ENTIRE stack trace
from above as an attachment in addition to anything else
that might help us fixing this issue.

#6 0x00000030b36e30fa in PyErr_Occurred () from /usr/lib64/libpython2.6.so.1.0
#7 0x00002b17bc001065 in PyROOT::TMethodHolder<PyROOT::TScopeAdapter, PyROOT::TMemberAdapter>::Execute(void*) ()
from /afs/cern.ch/sw/lcg/app/releases/ROOT/5.27.06/x86_64-slc5-gcc43-opt/root/lib/libPyROOT.so
#8 0x00002b17bbfffcf0 in PyROOT::TMethodHolder<PyROOT::TScopeAdapter, PyROOT::TMemberAdapter>::operator()(PyROOT::ObjectProxy*, _object*, _object*, long) () from /afs/cern.ch/sw/lcg/app/releases/ROOT/5.27.06/x86_64-slc5-gcc43-opt/root/lib/libPyROOT.so
#9 0x00002b17bc006b75 in PyROOT::(anonymous namespace)::mp_call(PyROOT::MethodProxy*, _object*, _object*) ()
from /afs/cern.ch/sw/lcg/app/releases/ROOT/5.27.06/x86_64-slc5-gcc43-opt/root/lib/libPyROOT.so
#10 0x00000030b3643ac8 in PyObject_Call () from /usr/lib64/libpython2.6.so.1.0
#11 0x00000030b36d5733 in PyEval_EvalFrameEx () from /usr/lib64/libpython2.6.so.1.0
#12 0x00000030b36da5a1 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.6.so.1.0
#13 0x00000030b36da622 in PyEval_EvalCode () from /usr/lib64/libpython2.6.so.1.0
#14 0x00000030b36f4f92 in ?? () from /usr/lib64/libpython2.6.so.1.0
#15 0x00000030b36f518a in PyRun_StringFlags () from /usr/lib64/libpython2.6.so.1.0
#16 0x00000030b36f6310 in PyRun_SimpleStringFlags () from /usr/lib64/libpython2.6.so.1.0
#17 0x00000030b3701f95 in Py_Main () from /usr/lib64/libpython2.6.so.1.0
#18 0x00000030b421d994 in __libc_start_main () from /lib64/libc.so.6
#19 0x0000000000400629 in _start ()

Segmentation fault

Hi,

ouch; I was under the mistaken impression that PyErr_Occurred() was GIL-safe. I have a current fix, but it looks like that doesn’t quite work under p2.2. Once I’ve got that one nailed, I’ll check it in.

Note that the fix entails to re-acquire the GIL for the duration of PyErr_Occurred(), which isn’t all that pretty, as that call is at the end of the C++ code, so an acquire/release/acquire is following in quick succession. Also, this probably means that error reporting (in case there are any) is broken, too. In principle that will only bite if there are coding bugs (i.e. not run-time errors), so that shouldn’t be too much of a limitation.

Later,
Wim

Hi,

fix is trunk, test is in roottest. The p2.2 implementation is functionally limited versus what’s in for p2.3 and later (I’m assuming folks are not using p2.2 anymore anyway).

Cheers,
Wim

Is there a nightly I can use without compiling on AFS?

Hi,

yes … tomorrow. :slight_smile: Depending on timing, would be either:

/afs/cern.ch/sw/lcg/app/nightlies/dev/Mon/ROOT/ROOT_today_python2.6

or (I’m not sure when the nightly starts or when the repository gets checked out):

/afs/cern.ch/sw/lcg/app/nightlies/dev/Tue/ROOT/ROOT_today_python2.6

Cheers,
Wim

Unfortunately it turns out that this is quite easy to break (5.34/03):

(note, I randomly chose the first gROOT function I saw which accepted a string argument)

Output:

#6 PyErr_Occurred () at Python/errors.c:77 #7 0x00007ff2f28f3299 in PyROOT::TCStringConverter::SetArg (this=0x24f3ef0, pyobject=0x7ff2fa9967e0, para=..., func=0x2660430) at pyroot/src/Converters.cxx:487 #8 0x00007ff2f2904c98 in PyROOT::TMethodHolder<PyROOT::TScopeAdapter, PyROOT::TMemberAdapter>::SetMethodArgs (this=0x1c7b110, args=0x7ff2fa9e2910, user=1) at pyroot/src/MethodHolder.cxx:576 #9 0x00007ff2f2906a4f in PyROOT::TClassMethodHolder<PyROOT::TScopeAdapter, PyROOT::TMemberAdapter>::operator() (this=0x1c7b110, args=0x7ff2fa9e2910, kwds=<optimized out>, user=1) at pyroot/src/ClassMethodHolder.cxx:34 #10 0x00007ff2f28e4128 in PyROOT::(anonymous namespace)::mp_call (pymeth=0x1faca10, args=0x7ff2fa9e2910, kwds=0x0) at pyroot/src/MethodProxy.cxx:484 #11 0x00007ff2fa507813 in PyObject_Call (func=0x1faca10, arg=<optimized out>, kw=<optimized out>) at Objects/abstract.c:2529

Is there a reason you don’t wrap the GIL at the C++ function call-site [1] itself with ALLOW_THREADS? I understand that it’s in principle not safe to use any python c-api whilst the GIL is released. AFAICT ctypes doesn’t do anything with the python c-api without holding the GIL.

[1] https://github.com/bbannier/ROOT/blob/master/bindings/pyroot/src/MethodHolder.cxx#L600

Peter,

that was done out of convenience, as it allows the wrapping at a single point. At the call site is within the Executor objects, meaning that the release code is duplicated a multitude of times.

Of course, what really needs to be done is a refactoring of this code: decisions to allow GIL release on a per-method basis, as well as memory management choices on a per method basis, were added much later in an ad-hoc fashion that’s not easily extended.

Anyway, now in v5-34-00-patches. NOT in trunk. I’d rather refactor the code first …

Cheers,
Wim

I really appreciate this useful information on Pyroot and multithreading. This conversation helps me to fix some errors.