Memory leak when processing RDataFrames in python loop

Dear ROOT experts,

I’ve run into some weird behaviour while trying to read a series of TTrees from separate files and process them using RDataFrames inside a python loop. Here is some example code that reproduces the error:

mems = []
for i in range(100):
    mems.append( psutil.Process().memory_info().rss / 1024**2 )

    f = ROOT.TFile(infile_name)
    tch = f.Get('nominal')
    
    rdf = ROOT.RDataFrame(tch)
    rdf = rdf.Filter('eventNumber>367927315')

    ens = rdf.AsNumpy(['eventNumber'])
    print(np.sum(ens['eventNumber']))
    
    f.Close()

As you can see, I’m trying to read trees inside the loop, apply a filter using an RDataFrame and then do some procesing in numpy. However, even though the file is closed and the numpy arrays go out of scope and are collected by the python garbage collector, the amount of memory used keeps growing:

image

Curiously, if I get rid of the Filter, the memory usage is much more limited:

image

What is actually going on here? Is there a way to avoid this?

Cheers,
Alex


Please read tips for efficient and successful posting and posting code

ROOT Version: 6.18/04
Platform: Not Provided
Compiler: Not Provided


Not sure why. Maybe try this:

    rdf0 = ROOT.RDataFrame(tch)
    rdf = rdf0.Filter(...)

Hi ferhue,

Unfortunately, this does not change anything. Even when I explicitly delete the RDataFrame the memory used keeps growing.

Cheers,
Alex

Hi @asopio ,

without a complete minimal reproducer that I can run, my guess is that the memory hogging comes from the code for the Filter expression that (in ROOT v6.18) is re-compiled at every loop (see e.g. How to delete RDataFrame and clean up memory - #2 by eguiraud ). Code that is just-in-time-compiled stays in memory until the end of the application. The amount of code generated at every new loop is much much smaller in more recent ROOT versions. Can you please try with ROOT v6.26.06? There should be a little increase, but nothing as dramatic as ~4MB per iteration.

Cheers,
Enrico

Hi Enrico,
Thanks for the quick response!

I ran the code again in v6.26 and the rate of memory growth is a lot lower than it was before:

v6.18:

v6.26:

However, I’m still confused as to why there is a steady increase in memory usage at all. Is it really only just-in-time-compiled code? Is there a good reason for why it is kept in memory even after the RDataFrame object gets deleted?

Here is a full minimal reproducer for the plots above:
dataframetest.py (918 Bytes)

Cheers,
Alex

Hi @asopio ,

Great! We did something right between releases :slight_smile:

Yes, it’s easy to verify with valgrind --tool=massif. These are the largest memory hoggers it sees (thank you for the self-contained reproducer):

12.15% (30,056,448B) 0x9E43CA4: llvm::BumpPtrAllocatorImpl<llvm::MallocAllocator, 4096ul, 4096ul>::Allocate(unsigned long, unsigned long) [clone .constprop.0] (in /home/blue/ROOT/master/cmake-build-foo/lib/libCling.so)
06.47% (15,994,384B) 0x7DBB588: clang::ASTReader::ReadASTBlock(clang::serialization::ModuleFile&, unsigned int) (in /home/blue/ROOT/master/cmake-build-foo/lib/libCling.so)
06.41% (15,848,928B) 0x7DBBAE5: clang::ASTReader::ReadASTBlock(clang::serialization::ModuleFile&, unsigned int) (in /home/blue/ROOT/master/cmake-build-foo/lib/libCling.so)
05.33% (13,189,120B) 0x9D093D8: llvm::BumpPtrAllocatorImpl<llvm::MallocAllocator, 4096ul, 4096ul>::Allocate(unsigned long, unsigned long) [clone .constprop.1] (in /home/blue/ROOT/master/cmake-build-foo/lib/libCling.so)
05.21% (12,875,749B) 0xA4005F8: llvm::Module::getOrInsertComdat(llvm::StringRef) (in /home/blue/ROOT/master/cmake-build-foo/lib/libCling.so)
05.07% (12,549,184B) 0x7160CC2: llvm::safe_malloc(unsigned long) (MemAlloc.h:26)
03.83% (9,461,760B) 0x9D16DD8: llvm::BumpPtrAllocatorImpl<llvm::MallocAllocator, 4096ul, 4096ul>::Allocate(unsigned long, unsigned long) [clone .constprop.0] (in /home/blue/ROOT/master/cmake-build-foo/lib/libCling.so)
03.39% (8,380,856B) 0xA4FA33A: llvm::SmallVectorBase::grow_pod(void*, unsigned long, unsigned long) (in /home/blue/ROOT/master/cmake-build-foo/lib/libCling.so)
03.25% (8,032,344B) 0xA4EF7E1: llvm::WritableMemoryBuffer::getNewUninitMemBuffer(unsigned long, llvm::Twine const&) (in /home/blue/ROOT/master/cmake-build-foo/lib/libCling.so)

etc. etc. (the percentages indicate the fraction of the memory allocated by the process that was allocated by that particular call, the biggest offender is llvm::BumpPtrAllocatorImpl at 12%).

Just-in-time-compiled code is added to an in-memory “shared library”, and Cling, the interpreter, provides no way to unload code once it’s there. On the RDataFrame side we try to maximize jitted code re-use.

Further mitigations

Besides using the latest ROOT release which is a bit more well-behaved, you can use, instead of rdf = rdf.Filter("eventNumber>367927315"), something like:

ROOT.gInterpreter.Declare("""
ROOT::RDF::RNode ApplyFilter(ROOT::RDF::RNode df) {
    return df.Filter([](int e) { return e>367927315; }, {"eventNumber"});
}
""")
...
rdf = ROOT.ApplyFilter(ROOT.RDF.AsRNode(rdf))

With that trick we just-in-time-compile the RDF transformation you need only once, then re-use it many times (note that we need to use a C++ lambda inside the Filter there otherwise we are back to the original situation). On my machine this is the situation with your original reproducer:

and this is with the change mentioned above:

There is still a little bit of memory creep because AsNumpy does a little bit of jitting as well.

The other thing you can do is run each iteration of your loop in a sub-process. When the sub-process ends, it deallocates all related memory.

Cheers,
Enrico

Hi Enrico,

Thank you very much for this in-depth explanation!

Cheers,
Alex

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.