Works as python script, does not work in Jupyter

https://cernbox.cern.ch/index.php/s/zLdw9VfoLdz0Zlg the ipynb file and the input file.
With implicit MT, the last cell (RDataFrame with Snapshot) is busy ([*]) forever consuming 0% CPU.
If I import the notebook as python script, it works fine.
Without MT it also works fine.

https://sft.its.cern.ch/jira/browse/ROOT-9659 related?


ROOT Version: 6.14/04
Platform: Ubuntu 18.04 (source /cvmfs/sft.cern.ch/lcg/views/LCG_94python3/x86_64-ubuntu1804-gcc7-opt/setup.sh)
Compiler: g++ (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0


I bet it’s Python waiting for Python to release its lock (to invoke list_to_vector_of). Try to make list_to_vector_of a C++ function e.g. by passing it as a string - that should work! @Danilo @eguiraud do you agree?

I was replacing the function call with ROOT.std.vector('string')().__iadd__(['test']) and also the whole cell with ROOT.gInterpreter.ProcessLine("""...""") with equivalent C++ RDataFrame code inside — it still does not work. There are only two things making it work: turning of the MT and reducing input file size. With small number of events it prolly just does not have enough time to spawn a new thread, which means that there is only one thing making the code working: no threads.

I don’t see how either of the two things you tried would be circumventing Python to be recursively invoked. The two diagnostic tests you mention make it very clear that it’s Python’s GIL. As a matter of fact, RDataFrame should probably refuse to call a non-C++ functor in MT mode: this cannot end well…

Can you instead call deta_dphi_frame.Snapshot(\"ATree\", \"output.root\", "someC++CodeThatDoesWhatYouNeed"), i.e. with the functor as a string?

And why it works as a python script even with MT?

Snapshot expects a vector of column names, initializer_list of column names or a regexp to match column names. It does not expect C++ code.
I’ve removed the third argument (which means snapshot all columns) and it still does not work.

Well, I tried :slight_smile: Let’s see what @Danilo and @eguiraud say when they’re back!

Hi @berserker,

happy new year.
I can reproduce the issue: thanks for sharing the files on Cernbox. I will debug this once back at work.

Cheers,
D

1 Like

Hi @berserker,

I think we figured out the problem. It is due to a classic deadlock situation. The two locks are the GIL and the ROOT mutex. It happens under certain circumstances that the GIL is acquired by C++ code in ROOT, perhaps after taking the ROOT global lock.
A fix is in the making.

Meanwhile I can suggest a workaround. Add

ROOT.gROOT.GetListOfClassGenerators().Clear()

after importing ROOT.
Thanks a lot for reporting this bug and making such a simple reproducer available!

Cheers,
D

1 Like

Worked for me. Thank you!