RdataFrame and jitted code memory consumption

RENATO_QUAGLIANI · February 19, 2021, 1:49pm

Dear experts

I have a question concerning rdataframe and how to handle /clear in a c++ code the memory consumed by the jitting. In practice i have not yet run a memory profile but i wonder if when a code is creating many dataframes and on each some filters and defines are done passing strings expressions, afaik that expressions will be compiled on the fly creating extra memory pressure for the apllication. Is that correct? If yes, what is not clear to me is when those expressions are cleared out. Are they remaining in memory until the executable run or is everything deleted once the dataframe goes out of scope? (or when created passing a ttree cleaning happens when closing the file?)
Renato

Please read tips for efficient and successful posting and posting code

ROOT Version: Not Provided
Platform: Not Provided
Compiler: Not Provided

eguiraud · February 19, 2021, 1:53pm

Hi,
just-in-time compiled code remains in memory until the end of the application, cling does not have a mechanism to easily unload that code at the moment. In C++ you can often just use the slightly more verbose, fully typed versions of the RDF methods and avoid just-in-time compilation altogether.

You should measure whether that’s a/the problem in your case though (premature optimization and all that).

Cheers,
Enrico

EDIT: another way to greatly reduce memory used is to create all computation graphs before you start the first event loop. Since v6.22, RDataFrame accumulates code to just-in-time compile even across different computation graphs, and it’s cheaper to generate code once for all RDataFrames rather than once per dataframe or even worse multiple times per dataframe.

RENATO_QUAGLIANI · February 19, 2021, 2:07pm

@eguiraud , i run fits reading and making datasets filtering and defining some columns on the fly on 20-30 different ntuples ( DataFrames ) . Thus my use case is about having to compile long expressions on many RDataFrames. The framework i use accomplish also to run over “pre-filtered” ntuples, but this is not as flexible as one can think of. I am not an expert but wouldn’t be better that once a RDataFrame goes out of scope all the JITTED code associated to the DataFrame gets completely cleared up ? Or have something like ROOT::ClearJITTED(); Something along this line is the RooMinimizer::cleanup() but i guess there what you clean is something existing only inside the Class itself

RENATO_QUAGLIANI · February 19, 2021, 2:14pm

Alternatively, when i compile my code which inside uses RDataFrame and i do GENERATE_DICTIONARY of some classes and methods, is there any “extra” compiling flag which can mitigate partially the issue from JITTED code memory usage?

eguiraud · February 19, 2021, 2:30pm

Yes, absolutely. As I mentioned it is a (current) limitation of cling that this is not possible. Unloading generated code is not as simple as freeing memory. We hope the situation will improve in the future.

If you create 20-30 different RDataFrames, you should make sure to create all computation graphs before the first event loop is started. That guarantees that only once “code generation pass” is performed for all 30 RDFs, which reduces the memory usage significantly w.r.t. performing 30 different passes.

If it helps, with ROOT master (and soon v6.24) you can have RDataFrame log the start of an event loop, the time spent in just-in-time compilation (and if you want even what code is being just-in-time compiled) by adding this line at the beginning of the application:

auto verbosity = ROOT::Experimental::RLogScopedVerbosity(ROOT::Detail::RDF::RDFLogChannel(), ROOT::Experimental::ELogLevel.kInfo);

If memory usage becomes a problem, the simplest workaround for now is to create each RDataFrame and run its event loop in a separate sub-process.

Cheers,
Enrico

RENATO_QUAGLIANI · February 20, 2021, 9:38am

Hi @eguiraud, do you think it would be an option to link my c++ code to this module GitHub - tsaarni/cpp-subprocess: popen() -like C++ library with iostream support for stdio forwarding and run my dataframes defines/filter snapshots inside a dispatched executable instead of doing it in the same executable? (granted i can catch errors etc…?) i want to avoid to refactor too much code and i see this as a possible solution.

Wile_E_Coyote · February 20, 2021, 10:28am

There once existed dedicated web pages describing how to split jobs using ROOT’s built-in PROOF, PROOF-Lite, and multi-threading / multi-processing features. Unfortunately, these descriptions are gone now.

What remains are the:
${ROOTSYS}/tutorials/proof
${ROOTSYS}/tutorials/multicore

RENATO_QUAGLIANI · February 20, 2021, 12:21pm

Ah good to know. Is there any example to look at for this?

RENATO_QUAGLIANI · February 20, 2021, 12:39pm

I mean my use case is very simple i think. I have 1 executable which internally does many RDataframe operations on many different tuples/thus many rdataframes, i want basically for each time that piece of code run to have a process forking on the job itself. Saying i run my executable and at some point thete will be 2 executables running (main and the one which i fork). Is TProof able to do that and allow me to deal with the operation simply? Would this cure the deleting of jitted code memory usage?

eguiraud · February 24, 2021, 2:27pm

Hi,
sorry for the high latency. I think there are a few things to check before going down the sub-process route:

how large is the memory usage for your problematic application?
running valgrind --tool=massif on it, does the memory hogging come from cling?
can you rearrange the application flow in a way that you book all operations for all RDataFrames first, and then run all event loops? That would guarantee the smallest memory footprint but also the best performance possible. That’s the recommended way to run multiple RDFs, whenever possible, and via RunGraphs you can also run the separate event loops concurrently for another performance boost

If the problem is indeed cling memory hogging, and it is not possible to book all RDF computations upfront but instead you have to build and run one RDF computation graph at a time, you can use TProcessExecutor to spawn one subprocess per RDF, with two caveats:

to reduce memory usage you should run RDFs one after the other, while TProcessExecutor::Process will spawn N processes, one per argument. So instead that passing all RDFs to a single TProcessExecutor::Process call, you need to call it multiple times, once per RDF
you have to make sure to call EnableImplicitMT inside the subprocess, because forking a multi-thread application can result in deadlocks or other problems

@Wile_E_Coyote currently maintained alternatives to PROOF-lite are TTreeProcessorMP or TProcessExecutor (for multi-process solutions) or TTreeProcessorMT, TThreadExecutor and RDataFrame (for multi-threading). We are also working on an RDataFrame-based PROOF alternative, it has just been merged in master as an experimental feature.

Cheers,
Enrico

EDIT: it took me a while to reply because I wanted to check with our cling experts how a cling-side solution to the memory-hogging problem would look like (i.e. what it would take exactly to implement something like the ROOT::ClearJitted() call that you propose. It is possible but very tricky – so it’s on our to-do list, but it will take a while.

EDIT 2: about cpp-subprocess: if each subprocess does not need to return anything to the parent process, that’s probably a viable solution. To return results to the parent process, TProcessExecutor might be a better alternative because it leverages ROOT I/O to pass C++ objects between processes, which is otherwise not trivial

RENATO_QUAGLIANI · March 1, 2021, 3:11pm

As soon as possible i will run a memory usage profiler in my application and report back. In the meanwhile, thanks a lot for the very useful suggestions and links to things i was not aware of yet ( the RunGraphs function) .
At the moment i cannot really rearrange the code so that i bookkep all operations for RDataFrames first.
Just wondering if :

Define<Double_t>("newCol","MathExpression") , instead of Define( "...");

would be already improving things or it’s uselss and the real fix would be to repack “MathExpression” into some lamdba function

eguiraud · March 1, 2021, 3:21pm

Define<Double_t>("newCol","MathExpression")

I don’t think that will compile, indeed lambda expressions are what you would use to avoid just-in-time compilation.

I’d be happy to take a look at a standalone reproducer.

Cheers,
Enrico

system · March 15, 2021, 3:22pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.