RDataFrame mutithreading in conflict with TPython

ROOT version: 6.18.00.
Issue: Code will run forever if TPython method is called before entering the event loop in RDataFrame when multithreading is enabled. The CPU usage will drop to zero after calling Snapshot.

The minimal reproducing code is:

ROOT::EnableImplicitMT()
TPython::Exec(“1”)
ROOT::RDataFrame(10).Define(“bar”,“return 1;”).Snapshot(“test”,“test.root”)

Hi,
The behaviour you see is due to a deadlock.

When you first use TPython, during the TPython initialization the main thread will acquire the GIL (global interpreter lock of Python). If you then create more threads (for example with RDataFrame) and one of them calls TClass::GetClass (like RDataFrame does), that thread will try to acquire the GIL - via the TPyClassGenerator registered by TPython - and wait forever.

(gdb) thread 2
[Switching to thread 2 (Thread 0x7fffdacec700 (LWP 24238))]
#0  0x00007ffff192cd12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00007ffff192cd12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007ffff00b26bc in PyCOND_TIMEDWAIT (cond=0x7ffff0470c00 <gil_cond>, mut=0x7ffff0470bc0 <gil_mutex>, us=<optimized out>)
    at /mnt/build/jenkins/workspace/lcg_rootext_build/BUILDTYPE/Release/COMPILER/gcc7/LABEL/centos7/build/externals/Python-3.6.5/src/Python/3.6.5/Python/condvar.h:103
#2  take_gil (tstate=tstate@entry=0x7fffd4003460)
    at /mnt/build/jenkins/workspace/lcg_rootext_build/BUILDTYPE/Release/COMPILER/gcc7/LABEL/centos7/build/externals/Python-3.6.5/src/Python/3.6.5/Python/ceval_gil.h:224
#3  0x00007ffff00b2c19 in PyEval_RestoreThread (tstate=tstate@entry=0x7fffd4003460)
    at /mnt/build/jenkins/workspace/lcg_rootext_build/BUILDTYPE/Release/COMPILER/gcc7/LABEL/centos7/build/externals/Python-3.6.5/src/Python/3.6.5/Python/ceval.c:368
#4  0x00007ffff00e749e in PyGILState_Ensure () at /mnt/build/jenkins/workspace/lcg_rootext_build/BUILDTYPE/Release/COMPILER/gcc7/LABEL/centos7/build/externals/Python-3.6.5/src/Python/3.6.5/Python/pystate.c:894
#5  0x00007ffff2f16ead in PyROOT::PyGILRAII::PyGILRAII (this=0x7fffdace5e78) at /home/etejedor/root/fork/root/bindings/pyroot/src/Utility.h:81
#6  0x00007ffff2f155d7 in TPyClassGenerator::GetClass (this=0x1b34b10, name=0x7ffff2bb7528 <typeinfo name for int> "i", load=true, silent=false)
    at /home/etejedor/root/fork/root/bindings/pyroot/src/TPyClassGenerator.cxx:41
#7  0x00007ffff2f1555e in TPyClassGenerator::GetClass (this=0x1b34b10, name=0x7ffff2bb7528 <typeinfo name for int> "i", load=true) at /home/etejedor/root/fork/root/bindings/pyroot/src/TPyClassGenerator.cxx:26
#8  0x00007ffff2f16ccf in TPyClassGenerator::GetClass (this=0x1b34b10, typeinfo=..., load=true) at /home/etejedor/root/fork/root/bindings/pyroot/src/TPyClassGenerator.cxx:285
#9  0x00007ffff79c79bf in TClass::GetClass (typeinfo=..., load=true) at /home/etejedor/root/fork/root/core/meta/src/TClass.cxx:3159
#10 0x00007ffff53a8929 in ROOT::Internal::GetClassHelper<int> (load=true, silent=false) at /home/etejedor/root/fork/build_oldpyroot_py3/include/TClass.h:588
#11 0x00007ffff53a8706 in TClass::GetClass<int> (load=true, silent=false) at /home/etejedor/root/fork/build_oldpyroot_py3/include/TClass.h:598
#12 0x00007ffff53a83a3 in TTree::Branch<int> (this=0x7fffd4002e40, name=0x37f3d60 "bar", obj=0x1b32c50, bufsize=32000, splitlevel=99) at /home/etejedor/root/fork/root/tree/tree/inc/TTree.h:354
#13 0x00007ffff7e194be in ?? ()

So it can be unsafe to use TPython in multi-threaded contexts. What you can do is to remove the TPyClassGenerator from the list of generators before creating the threads:

  ROOT::EnableImplicitMT();
  TPython::Exec("1");
  gROOT->GetListOfClassGenerators()->Clear();  // or you remove the TPyClassGenerator individually
  ROOT::RDataFrame(10).Define("bar","return 1;").Snapshot("test","test.root");

This will fix the issue, but you will not be able to use TPython::Import nor TPython::LoadMacro, since they rely on the generator to be there.

Hope this helps,
Enric

Problem is solved now. Thanks a lot!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.