Memory leak in pyROOT when using user-defined C++ macros from Python and TTree Friends

martialc · May 22, 2024, 1:18pm

ROOT Version: 6.26/04 (from /cvmfs/sft.cern.ch/lcg/views/LCG_102/x86_64-centos7-gcc8-opt)
Python Version: 3.11.2
Platform: CentOS Linux release 7.9.2009 (Core)

Hi! I’m kind of new to ROOT and pyROOT, but theses past few months I’ve been familiarizing myself with the framework and I’ve arrived to a script that does everything I need to do: read a TTree from a TFile, create a new TTree in a new TFile and compute some TBranches from the TBranches in the original TFile and writing them to the new TTree.

However, I’ve come to the realisation that this script is using enormous amounts of memory (the jobs in Condor end up using ~100GB of memory when computing 20-30 new branches in 300k entries files). This happens too in local.

Code structure

In detail, what I’ve done with the code is:

I’ve written a series of .cpp that look like this:

// project/macros/macro.cpp
#include <vector>
vector<float> macro(
    vector<float> value1,
    vector<float> value2
) {
    vector<float> result;
    // compute and push back values for result
    return result;
}

I’ve wrapped the C++ macros with Python functions so I can call them as part of a package:

# project/src/module/macros/macro.py
import ROOT

ROOT.gROOT.LoadMacro('project/macros/macro.cpp')

def macro(tree, *args, **kwargs):
    return ROOT.macro(
        getattr(tree, 'value1'), #branch with vector of floats
        getattr(tree, 'value2')  #branch with vector of floats
    )

I’ve built a Python class that handles dynamic creation of branches from these macros:

# project/src/classes/reTupler.py
class reTupler:
    def __init__(self, tree_name, new_file, src_file):
        self.src_file = ROOT.TFile.Open(src_file)
        self.src_tree = self.src_file.Get(tree_name)

        self.new_file = ROOT.TFile.Open(new_file,'recreate')
        self.new_tree = ROOT.TTree(tree_name, tree_name)

        # To access branches in 'src_tree' from 'new_tree':
        self.new_tree.AddFriend(self.src_tree)

        # To keep track of new branches and store values:
        self.new_branches = {}

    def add_branch(self, name, f, value_type='float'):
        self.new_branches[name] = {}
        self.new_branches[name]['f'] = f
        self.new_branches[name]['name']  = name
        self.new_branches[name]['value_type'] = value_type
        self.new_branches[name]['value']   = value = ROOT.std.vector(value_type)()
        self.new_branches[name]['tbranch'] = self.new_tree.Branch(name, value)

    def run(self):
        nentries = self.src_tree.GetEntries()
        for i in range(nentries):
            # Get entry and make sure src_tree and new_tree are synced
            self.src_tree.GetEntry(i)
            self.new_tree.GetEntry(i)

            # Now loop on all the branches that have been added:
            for branch_name, branch_dict in self.new_branches.items():
                branch_dict['value'].clear()
                [branch_dict['value'].push_back(result) for result in branch_dict['f'](self.new_tree)]

            # Fill entry with all computed branches
            self.new_tree.Fill()

        self.new_tree.Write()
        self.new_file.Close()
        self.src_file.Close()

In my code I distinguish between vector and scalar branches, but here for simplicity I’ve written only the vector case.

Minimal Working Example

I’ll make a minimal working example so anyone can reproduce this issue:

import ROOT

ROOT.gInterpreter.Declare('''
// project/macros/macro.cpp
#include <vector>
vector<float> macro(
    vector<float> value1,
    vector<float> value2
) {
    vector<float> result;
    // Copy values from value2 according to the sign of value1
    // Doing nothing also increases the memory usage...
    for (int i=0; i < value1.size(); i++) {
        if (value1[i] > 0) {result.push_back(value2[i]);};        
    };
    return result;
}
''')

def macro(tree, *args, **kwargs):
    return ROOT.macro(
        getattr(tree, 'value1'), #branch with vector of floats
        getattr(tree, 'value2')  #branch with vector of floats
    )

class reTupler:
    def __init__(self, tree_name, new_file, src_file):
        self.src_file = ROOT.TFile.Open(src_file)
        self.src_tree = self.src_file.Get(tree_name)

        self.new_file = ROOT.TFile.Open(new_file,'recreate')
        self.new_tree = ROOT.TTree(tree_name, tree_name)

        # To access branches in 'src_tree' from 'new_tree':
        self.new_tree.AddFriend(self.src_tree)

        # To keep track of new branches and store values:
        self.new_branches = {}

    def add_branch(self, name, f, value_type='float'):
        self.new_branches[name] = {}
        self.new_branches[name]['f'] = f
        self.new_branches[name]['name']  = name
        self.new_branches[name]['value_type'] = value_type
        self.new_branches[name]['value']   = value = ROOT.std.vector(value_type)()
        self.new_branches[name]['tbranch'] = self.new_tree.Branch(name, value)

    def run(self):
        nentries = self.src_tree.GetEntries()
        for i in range(nentries):
            # Get entry and make sure src_tree and new_tree are synced
            self.src_tree.GetEntry(i)
            self.new_tree.GetEntry(i)

            # Now loop on all the branches that have been added:
            for branch_name, branch_dict in self.new_branches.items():
                branch_dict['value'].clear()
                [branch_dict['value'].push_back(result) for result in branch_dict['f'](self.new_tree)]

            # Fill entry with all computed branches
            self.new_tree.Fill()

        self.new_tree.Write()
        self.new_file.Close()
        self.src_file.Close()
        
tree_name = 'DDTree'
src_path = 'path/to/src.root'
new_path = 'path/to/new.root'

retupler = reTupler('DDTree', new_path, src_path)
retupler.add_branch('new_branch', macro, 'float')

retupler.run()

This is a very reduced version of the code that is similar enough to the one I am currently using and it also presents the memory leak.

For this minimal working example I’ve used a .root with 100k entries and only two branches (‘value1’ and ‘value2’) that I’ve previously filled with 20-element vectors with random numbers from -999 to 999 using numpy.randrom.uniform (numpy.randrom.uniform(-999,999,20)). Just in case, I’ve used the following code:

import ROOT
import numpy as np


file_path = 'path/to/src.root'

file = ROOT.TFile.Open(file_path,'recreate')
tree = ROOT.TTree('DDTree','DDTree')

# Branch: value1
value1_value  = ROOT.std.vector('float')()
value1_branch = tree.Branch('value1',value1_value)

# Branch: value2
value2_value = ROOT.std.vector('float')()
value2_branch = tree.Branch('value2',value2_value)


value_length = 20
nentries = 100000
for i in range(nentries):
    tree.GetEntry(i)
    
    value1_value.clear()
    [value1_value.push_back(result) for result in np.random.uniform(-999,999,value_length)]
    
    value2_value.clear()
    [value2_value.push_back(result) for result in np.random.uniform(-99,999,value_length)]
    
    tree.Fill()

file.Write()
file.Close()

This code doesn’t present a memory leak (thankfully) (:

Fixes I’ve tried (and don’t seem to work)

I have tried different fixes that I’ve gathered from past topics and from recommendations from my colleagues but none seem to solve my issue. The ones I’ve tried so far are:

Using self.new_tree.FlushBaskets(); self.src_tree.DropBaskets() in regular invertals through the for loop.
Calling the garbage collector with gc.collect() in regular intervals through the for loop.
Using self.new_tree.DropBuffers(max_memory); self.src_tree.DropBuffers(max_memory) with different values for max_memory.
Setting explicitly the AutoSave with self.new_tree.SetAutoSave(step) where step is a high enough number (~10k) so that the runtime isn’t increased significantly.
Setting the address for the branches from self.src_tree creating a dictionary for source branches similar to the one for new branches (self.new_branches) and explicitly saying self.out_branch.SetBranchAddress(name, value).
In addition to the last fix, setting a fixed vector length for the branches high enough for the number of elements using SetBranchAddress’s third argument.
Using pointers for the macros in C++ instead of the values (can be done with the current code, just needs to re-write macro.cpp to use pointers).
Using Python 2 (Python Version: 2.7.5).

None of these solutions have worked so far for me, however, my implementation may not be perfect and some may actually work, this is just a list of the fixes I’ve tried so far.

Conclusion

This is my first time using ROOT, pyROOT and C++ and I may be doing something that is inadvertently causing this memory leak. I’ve been struggling with this for a month and I’d really like to progress with this issue, so any help anyone can provide will be greatly appreciated (:

jonas · May 22, 2024, 4:28pm

Dear @martialc,

welcome to the ROOT forum and thank you for this excellent first post! Getting a full reproducer of the problem already in the initial post is golden

I have identified the memory leak in PyROOT and it will be fixed in ROOT 6.32, which comes out next week:

github.com/root-project/root

[PyROOT] Fix memory leak in TTree `getattr` pythonization

root-project:master ← guitargeek:get_branch

opened 04:23PM - 22 May 24 UTC

guitargeek

+46 -33

As reported on the forum: https://root-forum.cern.ch/t/memory-leak-in-pyroot-wh…en-using-user-defined-c-macros-from-python-and-ttree-friends/59432 I fixed the memory leak in the Pythonization is in the usual way how I fix problems with the PyROOT CPython extension: re-implementing the offending parts in C++ and hoping that the problem is gone. Which it is! The problem can be reproduced with a variation of the forum reproducer: ```python import ROOT import numpy as np ROOT.gInterpreter.Declare( """ template<class T> void * MyGetAddress(T * b) { return *(void**)b->GetAddress(); } """ ) def macro(tree, *args, **kwargs): import cppyy.ll # manually v1 = cppyy.ll.cast[tree.GetBranch("value1").GetClassName() + "*"](ROOT.MyGetAddress(tree.GetBranch("value1"))) # using the Pythonization v2 = getattr(tree, "value2") return None pinfo = ROOT.ProcInfo_t() def print_memory(i): ROOT.gSystem.GetProcInfo(pinfo) print(i, "memory usage", pinfo.fMemResident, pinfo.fMemVirtual) class reTupler: def __init__(self, tree_name, new_file, src_file): self.src_file = ROOT.TFile.Open(src_file) self.src_tree = self.src_file.Get(tree_name) self.new_file = ROOT.TFile.Open(new_file, "recreate") self.new_tree = ROOT.TTree(tree_name, tree_name) # To access branches in 'src_tree' from 'new_tree': self.new_tree.AddFriend(self.src_tree) # To keep track of new branches and store values: self.new_branches = {} def add_branch(self, name, f, value_type="float"): self.new_branches[name] = {} self.new_branches[name]["f"] = f self.new_branches[name]["name"] = name self.new_branches[name]["value_type"] = value_type self.new_branches[name]["value"] = value = ROOT.std.vector(value_type)() self.new_branches[name]["tbranch"] = self.new_tree.Branch(name, value) def run(self): nentries = self.src_tree.GetEntries() for i in range(nentries): # Get entry and make sure src_tree and new_tree are synced self.src_tree.GetEntry(i) self.new_tree.GetEntry(i) # Now loop on all the branches that have been added: for branch_name, branch_dict in self.new_branches.items(): branch_dict["value"].clear() # self.new_tree branch_dict["f"](self.new_tree) # [branch_dict['value'].push_back(result) for result in branch_dict['f'](self.new_tree)] # Fill entry with all computed branches self.new_tree.Fill() if i % 10000 == 0: print_memory(i) self.new_tree.Write() self.new_file.Close() self.src_file.Close() file_path = "_src.root" file = ROOT.TFile.Open(file_path, "recreate") tree = ROOT.TTree("DDTree", "DDTree") # Branch: value1 value1_value = ROOT.std.vector("float")() value1_branch = tree.Branch("value1", value1_value) # Branch: value2 value2_value = ROOT.std.vector("float")() value2_branch = tree.Branch("value2", value2_value) value_length = 20 nentries = 100000 for i in range(nentries): tree.GetEntry(i) value1_value.clear() [value1_value.push_back(result) for result in np.random.uniform(-999, 999, value_length)] value2_value.clear() [value2_value.push_back(result) for result in np.random.uniform(-99, 999, value_length)] tree.Fill() file.Write() file.Close() tree_name = "DDTree" src_path = "_src.root" new_path = "_new.root" retupler = reTupler("DDTree", new_path, src_path) retupler.add_branch("new_branch", macro, "float") retupler.run() ``` Output without this PR: ```txt 0 memory usage 361424 1544100 10000 memory usage 380580 1548912 20000 memory usage 386148 1554504 30000 memory usage 391332 1559540 40000 memory usage 396324 1565324 50000 memory usage 402084 1572740 60000 memory usage 407652 1577572 70000 memory usage 413028 1582796 80000 memory usage 418596 1588976 90000 memory usage 423780 1594012 ________________________________________________________ Executed in 3.40 secs fish external usr time 3.33 secs 399.00 micros 3.33 secs sys time 1.96 secs 106.00 micros 1.96 secs ``` Output with this PR: ```txt 0 memory usage 361396 1544116 10000 memory usage 375848 1544304 20000 memory usage 375848 1544304 30000 memory usage 375848 1544304 40000 memory usage 375848 1544304 50000 memory usage 375848 1544304 60000 memory usage 375848 1544304 70000 memory usage 375848 1544304 80000 memory usage 375848 1544304 90000 memory usage 375848 1544304 ________________________________________________________ Executed in 2.08 secs fish external usr time 2.06 secs 471.00 micros 2.06 secs sys time 1.99 secs 126.00 micros 1.99 secs ``` The time measurements exclude the toy data generation. The new implementation is also almost twice as fast as the old one, so a win-win! **Note:** I''m pretty sure there was also a JIRA issue about this problem, I can't find it anymore...

In the meantime, the workaround is to avoid using the TTree getattr pythonization, but get the branch data in the Python world manually:

ROOT.gInterpreter.Declare(
    """
template<class T>
void * MyGetAddress(T * b) {
   return *(void**)b->GetAddress();
}
"""
)


def macro(tree, *args, **kwargs):

    import cppyy.ll

    # manually (GetAddress() doesn't work in PyROOT, so I created a C++ wrapper)
    v1 = cppyy.ll.cast[tree.GetBranch("value1").GetClassName() + "*"](ROOT.MyGetAddress(tree.GetBranch("value1")))
    v2 = cppyy.ll.cast[tree.GetBranch("value2").GetClassName() + "*"](ROOT.MyGetAddress(tree.GetBranch("value2")))
    # using the Pythonization
    # v1 = getattr(tree, "value1")
    # v2 = getattr(tree, "value2")
    return None

I hope this helps, and keep up the great work! But don’t iterate over 10000 entries in Python if avoidable

Cheers,
Jonas

martialc · May 23, 2024, 7:25am

Dearest @jonas,

Yep, that fixed it! Thank you so much for such a prompt response, this issue has been haunting me for months now

I don’t know enough C++ to understand the dark magic you are using here to solve the issue, but I’d like to understand better the fix if it’s not a bother (:

Also, this was the JIRA issue (https://its.cern.ch/jira/browse/ROOT-9025) that I saw that was related to a memory leak with __getattr__ in past versions of Python (<3.6). I thought this might be the one you were referring to in the pull request (I’ve since seen it’s not, but just in case it’s usefull, here it is (: ).

Thanks again for such a quick fix!! <3

system · June 6, 2024, 7:25am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.