RDataframe Decahce

Dear Root Expert,

I have been working on a cutflow program center around RDataframe object. The cache function is an attractive feature and it have been promising on making a fast cut-and-count study. However working on big dataset the caching method is limited by memory. I am implementing a looping method on cut and counting cached datasets, on the next iteration how does one de-cache the previous cached rdataframe to preserve memory?

Thanks,
Siewyan
_ROOT Version: 6.22/00
_Platform: Ubuntu 18.04
_Compiler: 7.5.0

Hi @SiewYan,
cached data is deleted when the result of Cache and all RDF nodes depending on it go out of scope.
Also note that you can selectively Cache just some branches.

If you loop over many such datasets, some memory hogging could be due to the just-in-time compilation of parts of the RDF computation graph. Memory taken by just-in-time compiled code is only released at process exit. If that is the case, one possible solution is to write your tool as a command-line tool, compile it with optimizations and run it in a script as you would do with a linux command line tool.

It would be very useful if you could provide a small, self-contained reproducer that showed the memory issue. Doesn’t matter how small the dataset or the computation graph are, running the loop enough times I should be able to see the memory problem.

Cheers,
Enrico

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.