I’m using RDataFrames in PyROOT for my analysis to read in ntuples and fill a number of histograms. In most cases I use a pre-compiled library to apply my filters and new-column definitions, however in some case it was more convenient to apply these actions as string expressions, resulting in just-in-time compilations. Because of the large numbers of samples I have, this JITing resulted in the memory-hog issue described elsewhere, e.g. JIT Compilation Memory Hogs in RDataFrame.
To fix this memory issue, I run the RDataFrame event loop in a new process so that all the memory consumed by the JIT compilation is released once the event loop finishes. I define the histograms in the parent process and trigger the event in the child process. This is a condensed version of what I am doing:
import ROOT import multiprocessing ROOT.ROOT.EnableImplicitMT() def fill_hists(df, hist_args): df_local = df.Filter("...", "...").Define("...", "...") rhists =  for args in hist_args: rhists.append(df_local.Histo1D(*args)) with multiprocessing.Manager() as manager: results = manager.dict() proc = multiprocessing.Process( target=fill_hists_worker, args=(results, rhists) ) proc.start() proc.join() def fill_hists_worker(results, rhists): for rhist in rhists: hist = rdf_hist.GetValue() results[hist.GetName()] = hist
However, when I trigger the event loop in a new process in this way, the execution time to fill the histograms increases by about a factor of how many CPU cores I have, leading me to believe it is running the event loop in a single thread dispite having called
ROOT.ROOT.EnableImplicitMT(). The alternative, as mentioned elsewhere, is to using a
multiprocessing.Pool() instead of a single process, but I would have thought this would run a separate event loop in each process in the pool, which is not ideal since only a single event loop is necessary.
Is it possible to use ROOT’s implicit MT in a Python
multiprocessing.Process()? And if not, what would the solution be to run a single event loop in parallel while still avoiding the JIT memory-hogging issue?
NOTE: I am using ROOT v6.20/04, since this is the latest version available on my local cluster.
ROOT Version: 6.20/04
Platform: x86_64 (Red Hat 4.8.5-44)
Compiler: gcc 9.3.0