RDF memory consumption

Dear experts,

we are setting up our framework using roodataframes to study the dihiggs production in its decay of two b jets and two taus. The code we use runs on “flat” ntuples and creates a bunch of histograms in all kind of different regions. It runs all fine however we have a few concerns about the memory usage. We lately added a few more regions, making the roodataframes a bit more complicated and this increased our memory consumption taking 10-20 GB of virtual memory and needing ~20 minutes to JIT. This is currently without any systematics which we still need to include. Our worry is that including this will increase memory to such an extent that it is not usable anymore. I will quickly try to summarize the workflow we currently have. After could you give us any advice on how to reduce the memory consumption?

We use rdfs in root version 6.30.02 in python however have compiled function in c++, which we use in the Define calls of rdf. The first thing we do is create RooDataSetSpecs for each different kind of process, eg. data/signal/background1/background2/… This will lead us to a total of ~30 different types of processes. For all of these processes we create an rdf analysis where we apply filters to create new regions, calculate new variables and book histograms in all these different regions. Once we have a rdf for all the different processes we use the RunGraphs function on multiple cores on all the different rdfs we created. We understand that RunGraphs is smart enough to understand that different regions for the same process is no “new” code and can do all this in the background. Here the code is also JITTED and where the 20 GB of virtual memory is being produced.

We understand that this is quite a complicated analysis, however it is very much a realistic scenario. We are still adding things which means that the rdf operation will become even more complex, leading to more memory consumption. If interested we can provide more information on what we actually do. For now do you have any tips to reduce the memory?

Thank you very much,
Jordy Degens

ps: we are aware of this PR that should reduce memory consumption:

Dear Jordy,

Thank you for reaching out with this complete report: being exposed to such realistic, non trivial scenarios is very useful for us.

A first response: the performance improvement you cited leads to massive reduction of memory usage (factors) as well as noticeable speedups, depending on the cases. Did you have the chance to try that out and see if it has an effect for your use case?

An admittedly naive question: even if yours is a sophisticated, real-life analysis, would you have a reproducer for us to run, profile and improve?

I add in the loop @vpadulan and @mczurylo, our RDF experts.

Cheers,
Danilo

Dear @Jordy_Degens ,

Thanks for reaching out to the forum! The PR you reference is now merged in ROOT master. Would it be possible for you to test ROOT with the latest changes, for example by building it yourself or using the LCG nightlies via CVMFS? It would be great to understand if this PR concretely contributes to your issue or if it does not have any effect.

Cheers,

Vincenzo

Hi Vincenzo, Danilo,

Thank you for your reply. We are trying the latest setup using the nightly root LCG build. We will do some more tests but here are already some preliminary findings.

The current version of our code on an alma9 machine using the latest HEAD of root via:

source /cvmfs/sft.cern.ch/lcg/views/dev3/latest/x86_64-el9-gcc11-opt/setup.sh 

are:

Virt memory 17 GB
Res memory 14 GB

The running of R.RDF.RunGraphs(dfs) on 12 cores takes: 4.5 hours

Compared to a centos7 machine (I’ll repeat using an alma9 release) for views LCG_105 x86_64-centos7-gcc11-op results in:

Virt memory: 21GB
Res memory: 18GB

The running of R.RDF.RunGraphs(dfs) on 12 cores takes: 1.5 hours

This means a drop of memory usage of ~20%. I think compiling is much quicker, but running is not. We will do some more tests and report back once these are done. It could be that since we are running on two different machines that this is the reason.

I have two more question related to running RDF in python. At the moment we do a lot of actions like this:

rdf.Define("deltaphi_Hbb_htt", "ROOT::VecOps::DeltaPhi<float>(bbtt_H_bb_phi, bbtt_H_vis_tautau_phi)")

In older versions it was not possible to do the following:

rdf.Define("deltaphi_Hbb_htt", ROOT.VecOps.DeltaPhi, ["bbtt_H_bb_phi", "bbtt_H_vis_tautau_phi"])

But I think this has also been recently added. Would this reduce memory consumption? in C++ I think it would.

Lastly we are now also starting on including systematics. We see there is the Vary and VariationsFor but this is not exactly corresponding on how we usually do systematics. It’s not as simple as just multiplying a variable with a factor 0.9/1.1. We would commonly store a variation in a different branch, eg the nominal branch would be lep_pt and the systematic variations would be lep_pt_SYS_UP and lep_pt_SYS_DOWN. Do you know of a way to tackle this? We could write a wrapper around our code creating more defines and use filters that are using the systematic branches but maybe there is a smarter way of doing this.

Thanks a lot once more for the help you are offering. We are thinking on how to provide a real example to recreate our setup. However this is off course not completely straightforward due to experiment policies. We will see what we can do here.

Kind regards,
Jordy

Dear @Jordy_Degens,

Thanks for your prompt reply and testing out the latest status. I was expecting the memory decrease (hopefully a bit more than 20%, but that’s at least an improvement :slight_smile: ). I was definitely not expecting a runtime increase, actually in all my other tests I also saw a very noticeable runtime decrease after that PR. Thanks for running on the same machine with both versions of ROOT so we can understand whether this increase is real, do keep me updated on that.

Regarding the question about Pythonic API, you are right in that there has been some work ongoing to offer the possibility to call directly Python functions within the RDataFrame API, but that is not mature enough for public consumption and thus it will not make it into the official ROOT releases for now. Even when that will be the case though, I expect that the memory will slightly increase due to the extra jitting via the Python numba package. Clearly we will also run some memory benchmarks before making this feature actually public.

Lastly, seeing that you are trying to upscale your application into including systematics, it might be worth to have a separate discussion offline to kickstart this part of the project in the right direction. Would it be ok for you? In that case I can send you an email in private and we can setup this chat.

Cheers,
Vincenzo

Hi Vincenzo,

I had a quick run on the alma9 machine with root version 6.30 and the results are:
Runtime: ~90 minutes
Memory: 19GB

So the memory improvement is much less using the alma9 machine. Still an improvement of 2GB, ~10%. I do worry about the increase in runtime.

Lastly, seeing that you are trying to upscale your application into including systematics, it might be worth to have a separate discussion offline to kickstart this part of the project in the right direction. Would it be ok for you? In that case I can send you an email in private and we can setup this chat.

Please reach out, happy to work together on this.

Cheers,
Jordy