RDataFrame multithread performance

sciencecw · September 11, 2021, 3:17pm

I have an optimization script in python that makes use of RDataFrame to define hundreds of cuts. When I turn on multithread option, it doesn’t seem to produce any speed up, and top shows 100% usage (=1core) even in a multicore server. Presumably an optimization script should benefit the most from the speed up and less limited by IO. Could this be due to system settings? In addition, I notice a RDataFrame benchmark is available for C++. Is there a python version?

Please read tips for efficient and successful posting and posting code

_ROOT Version:_6.18/04
_Platform:_CentOS7 3.10.0-1160.11.1.el7.x86_64
Compiler: gcc 8.2.0
Python: 2.7.15+

jalopezg · September 13, 2021, 8:07am

Hi @sciencecw,

Our expert in RDataFrame is @eguiraud; unfortunately, he is currently on vacations. I will try to reply as accurately as possible to your question.
As can be seen here, support for multiple threads has to be enabled via ROOT::EnableImplicitMT(); however, I don’t know whether this plays well with Python. Maybe, @vpadulan can also provide some information on this. Also see this related forum topic.

Additionally, we support distributed computation via DistRDF (see documentation here).

Cheers,
J.

pieterdavid · September 13, 2021, 9:29am

Hi @sciencecw,
Just two more suggestions: as the documentation points out ROOT::EnableImplicitMT() must be called before constructing the RDataFrame object to be correctly picked up (it works fine for me from python then); and if you want to use many (>32 or so) threads you should use at least ROOT 6.24/02 (6.22/00 also had significant improvements in JITting speed and memory usage - according to your post you are still using 6.18/04).
Cheers,
Pieter

vpadulan · September 21, 2021, 3:42pm

Dear @sciencecw ,
@pieterdavid is right, you should write ROOT.EnableImplicitMT() at the very top of your program to be sure you’re making use of the full multitthreaded capabilities of RDataFrame from the start. This should work well with Python too. The fact top is showing just 100% utilisation could be an hint that implicit MT is indeed not properly enabled. Feel free to also post a small snippet of your code maybe we can spot something together. As for the benchmarks you mention, there are some in Python in our rootbench repo.
Cheers,
Vincenzo

Kohler · September 22, 2021, 5:56am

Multi-threading in Python has Global Interpreter Lock (GIL) that prevents two threads in the same process to run at the same time. If you had a a lot of disk IO happening, multi-threading would have helped because DISK IO is separate process that can handle locks. Or, if you had a separate application used by your Python code that can handle locks, multi-threading would have helped. Multi-processing, on the other hand, will use all the cores of your CPU as separate processes as opposed to multi-threading.

sciencecw · September 22, 2021, 8:57pm

Does the lock prevent parallelism on threads as oppose to cores, or both? I’d think that parallelism works the best when the job is not IO intensive, so an optimization code with hundreds of filters should gain the most.

sciencecw · September 22, 2021, 9:01pm

That is actually something I checked. Unfortunately my script is too entwined with the package to simplify, but the source code is here.

vpadulan · September 22, 2021, 9:13pm

Hi @sciencecw ,
Thanks for your example, I don’t see any call to ROOT::EnableImplicitMT anywhere so unless it’s activated somewhere else previously you can be sure you are not using any multithreading capability. Where do you exactly activate EnableImplicitMT in your application?

With that said, while it is true that the Python GIL will prevent true multithreading for Python-only applications, ROOT is mainly written in C++ and actually whenever you use the multithreaded capabilities of ROOT you are calling C++ functions that do not suffer from the Python GIL at all. That will not be a problem if you use ROOT::EnableImplicitMT
Cheers,
Vincenzo

sciencecw · September 22, 2021, 9:22pm

One of the first thing the script does is to execute MakeBase.py which calls SampleManager in SampleManager.py which calls EnableImplicitMT() here before any RDataFrame is called. All RDataFrame functions are invoked in SampleFrame class in SampleManager.py which is the superclass for SampleManager

I have also tried invoking EnableImplicitMT() before MakeBase.py but it makes no difference.

That might be too complicated to debug that’s why I wonder if there is a benchmark I can run to see if the problem is on the environment or my script

vpadulan · September 22, 2021, 9:32pm

See for example these two tutorials:
https://root.cern/doc/master/df102__NanoAODDimuonAnalysis_8py.html
https://root.cern/doc/master/df104__HiggsToTwoPhotons_8py.html

The first one is more lightweight and the second one a bit beefier. In both cases, EnableImplicitMT is called at the beginning of the tutorial and you should see a high CPU usage according to how many cores there are on your machine. If you would also download the files locally it would be even higher probably

eguiraud · September 27, 2021, 3:30pm

Also note that RDataFrame parallelizes over TTree clusters, that is batches of TTree entries that are zipped together in the file. If your input dataset is small and simple, it could be that you only have one cluster in the dataset.

If the tutorials that @vpadulan linked show CPU usage higher than 100%, we’ll probably need to be able to reproduce the problem on our side to figure out what’s going on. If you see only 100% usage also with those tutorials, it probably means that ROOT, for whatever reason, can’t see or use more than one core on that machine (never seen this, but we can check a few things if that’s the case).

Cheers,
Enrico

system · October 11, 2021, 3:30pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.