Memory leak when processing RDataFrames in python loop

eguiraud · August 15, 2022, 12:57pm

Great! We did something right between releases

Yes, it’s easy to verify with valgrind --tool=massif. These are the largest memory hoggers it sees (thank you for the self-contained reproducer):

12.15% (30,056,448B) 0x9E43CA4: llvm::BumpPtrAllocatorImpl<llvm::MallocAllocator, 4096ul, 4096ul>::Allocate(unsigned long, unsigned long) [clone .constprop.0] (in /home/blue/ROOT/master/cmake-build-foo/lib/libCling.so)
06.47% (15,994,384B) 0x7DBB588: clang::ASTReader::ReadASTBlock(clang::serialization::ModuleFile&, unsigned int) (in /home/blue/ROOT/master/cmake-build-foo/lib/libCling.so)
06.41% (15,848,928B) 0x7DBBAE5: clang::ASTReader::ReadASTBlock(clang::serialization::ModuleFile&, unsigned int) (in /home/blue/ROOT/master/cmake-build-foo/lib/libCling.so)
05.33% (13,189,120B) 0x9D093D8: llvm::BumpPtrAllocatorImpl<llvm::MallocAllocator, 4096ul, 4096ul>::Allocate(unsigned long, unsigned long) [clone .constprop.1] (in /home/blue/ROOT/master/cmake-build-foo/lib/libCling.so)
05.21% (12,875,749B) 0xA4005F8: llvm::Module::getOrInsertComdat(llvm::StringRef) (in /home/blue/ROOT/master/cmake-build-foo/lib/libCling.so)
05.07% (12,549,184B) 0x7160CC2: llvm::safe_malloc(unsigned long) (MemAlloc.h:26)
03.83% (9,461,760B) 0x9D16DD8: llvm::BumpPtrAllocatorImpl<llvm::MallocAllocator, 4096ul, 4096ul>::Allocate(unsigned long, unsigned long) [clone .constprop.0] (in /home/blue/ROOT/master/cmake-build-foo/lib/libCling.so)
03.39% (8,380,856B) 0xA4FA33A: llvm::SmallVectorBase::grow_pod(void*, unsigned long, unsigned long) (in /home/blue/ROOT/master/cmake-build-foo/lib/libCling.so)
03.25% (8,032,344B) 0xA4EF7E1: llvm::WritableMemoryBuffer::getNewUninitMemBuffer(unsigned long, llvm::Twine const&) (in /home/blue/ROOT/master/cmake-build-foo/lib/libCling.so)

etc. etc. (the percentages indicate the fraction of the memory allocated by the process that was allocated by that particular call, the biggest offender is llvm::BumpPtrAllocatorImpl at 12%).

Just-in-time-compiled code is added to an in-memory “shared library”, and Cling, the interpreter, provides no way to unload code once it’s there. On the RDataFrame side we try to maximize jitted code re-use.

Further mitigations

Besides using the latest ROOT release which is a bit more well-behaved, you can use, instead of rdf = rdf.Filter("eventNumber>367927315"), something like:

ROOT.gInterpreter.Declare("""
ROOT::RDF::RNode ApplyFilter(ROOT::RDF::RNode df) {
    return df.Filter([](int e) { return e>367927315; }, {"eventNumber"});
}
""")
...
rdf = ROOT.ApplyFilter(ROOT.RDF.AsRNode(rdf))

With that trick we just-in-time-compile the RDF transformation you need only once, then re-use it many times (note that we need to use a C++ lambda inside the Filter there otherwise we are back to the original situation). On my machine this is the situation with your original reproducer:

and this is with the change mentioned above:

There is still a little bit of memory creep because AsNumpy does a little bit of jitting as well.

The other thing you can do is run each iteration of your loop in a sub-process. When the sub-process ends, it deallocates all related memory.

Cheers,
Enrico