I have a question concerning rdataframe and how to handle /clear in a c++ code the memory consumed by the jitting. In practice i have not yet run a memory profile but i wonder if when a code is creating many dataframes and on each some filters and defines are done passing strings expressions, afaik that expressions will be compiled on the fly creating extra memory pressure for the apllication. Is that correct? If yes, what is not clear to me is when those expressions are cleared out. Are they remaining in memory until the executable run or is everything deleted once the dataframe goes out of scope? (or when created passing a ttree cleaning happens when closing the file?)
Renato
Hi,
just-in-time compiled code remains in memory until the end of the application, cling does not have a mechanism to easily unload that code at the moment. In C++ you can often just use the slightly more verbose, fully typed versions of the RDF methods and avoid just-in-time compilation altogether.
You should measure whether that’s a/the problem in your case though (premature optimization and all that).
Cheers,
Enrico
EDIT: another way to greatly reduce memory used is to create all computation graphs before you start the first event loop. Since v6.22, RDataFrame accumulates code to just-in-time compile even across different computation graphs, and it’s cheaper to generate code once for all RDataFrames rather than once per dataframe or even worse multiple times per dataframe.
@eguiraud , i run fits reading and making datasets filtering and defining some columns on the fly on 20-30 different ntuples ( DataFrames ) . Thus my use case is about having to compile long expressions on many RDataFrames. The framework i use accomplish also to run over “pre-filtered” ntuples, but this is not as flexible as one can think of. I am not an expert but wouldn’t be better that once a RDataFrame goes out of scope all the JITTED code associated to the DataFrame gets completely cleared up ? Or have something like ROOT::ClearJITTED(); Something along this line is the RooMinimizer::cleanup() but i guess there what you clean is something existing only inside the Class itself
Alternatively, when i compile my code which inside uses RDataFrame and i do GENERATE_DICTIONARY of some classes and methods, is there any “extra” compiling flag which can mitigate partially the issue from JITTED code memory usage?
Yes, absolutely. As I mentioned it is a (current) limitation of cling that this is not possible. Unloading generated code is not as simple as freeing memory. We hope the situation will improve in the future.
If you create 20-30 different RDataFrames, you should make sure to create all computation graphs before the first event loop is started. That guarantees that only once “code generation pass” is performed for all 30 RDFs, which reduces the memory usage significantly w.r.t. performing 30 different passes.
If it helps, with ROOT master (and soon v6.24) you can have RDataFrame log the start of an event loop, the time spent in just-in-time compilation (and if you want even what code is being just-in-time compiled) by adding this line at the beginning of the application:
auto verbosity = ROOT::Experimental::RLogScopedVerbosity(ROOT::Detail::RDF::RDFLogChannel(), ROOT::Experimental::ELogLevel.kInfo);
If memory usage becomes a problem, the simplest workaround for now is to create each RDataFrame and run its event loop in a separate sub-process.
I mean my use case is very simple i think. I have 1 executable which internally does many RDataframe operations on many different tuples/thus many rdataframes, i want basically for each time that piece of code run to have a process forking on the job itself. Saying i run my executable and at some point thete will be 2 executables running (main and the one which i fork). Is TProof able to do that and allow me to deal with the operation simply? Would this cure the deleting of jitted code memory usage?
Hi,
sorry for the high latency. I think there are a few things to check before going down the sub-process route:
how large is the memory usage for your problematic application?
running valgrind --tool=massif on it, does the memory hogging come from cling?
can you rearrange the application flow in a way that you book all operations for all RDataFrames first, and then run all event loops? That would guarantee the smallest memory footprint but also the best performance possible. That’s the recommended way to run multiple RDFs, whenever possible, and via RunGraphs you can also run the separate event loops concurrently for another performance boost
If the problem is indeed cling memory hogging, and it is not possible to book all RDF computations upfront but instead you have to build and run one RDF computation graph at a time, you can use TProcessExecutor to spawn one subprocess per RDF, with two caveats:
to reduce memory usage you should run RDFs one after the other, while TProcessExecutor::Process will spawn N processes, one per argument. So instead that passing all RDFs to a single TProcessExecutor::Process call, you need to call it multiple times, once per RDF
you have to make sure to call EnableImplicitMTinside the subprocess, because forking a multi-thread application can result in deadlocks or other problems
@Wile_E_Coyote currently maintained alternatives to PROOF-lite are TTreeProcessorMP or TProcessExecutor (for multi-process solutions) or TTreeProcessorMT, TThreadExecutor and RDataFrame (for multi-threading). We are also working on an RDataFrame-based PROOF alternative, it has just been merged in master as an experimental feature.
Cheers,
Enrico
EDIT: it took me a while to reply because I wanted to check with our cling experts how a cling-side solution to the memory-hogging problem would look like (i.e. what it would take exactly to implement something like the ROOT::ClearJitted() call that you propose. It is possible but very tricky – so it’s on our to-do list, but it will take a while.
EDIT 2: about cpp-subprocess: if each subprocess does not need to return anything to the parent process, that’s probably a viable solution. To return results to the parent process, TProcessExecutor might be a better alternative because it leverages ROOT I/O to pass C++ objects between processes, which is otherwise not trivial
As soon as possible i will run a memory usage profiler in my application and report back. In the meanwhile, thanks a lot for the very useful suggestions and links to things i was not aware of yet ( the RunGraphs function) .
At the moment i cannot really rearrange the code so that i bookkep all operations for RDataFrames first.
Just wondering if :
Define<Double_t>("newCol","MathExpression") , instead of Define( "...");
would be already improving things or it’s uselss and the real fix would be to repack “MathExpression” into some lamdba function