Memory leak in pyROOT when using user-defined C++ macros from Python and TTree Friends


ROOT Version: 6.26/04 (from /cvmfs/sft.cern.ch/lcg/views/LCG_102/x86_64-centos7-gcc8-opt)
Python Version: 3.11.2
Platform: CentOS Linux release 7.9.2009 (Core)


Hi! I’m kind of new to ROOT and pyROOT, but theses past few months I’ve been familiarizing myself with the framework and I’ve arrived to a script that does everything I need to do: read a TTree from a TFile, create a new TTree in a new TFile and compute some TBranches from the TBranches in the original TFile and writing them to the new TTree.

However, I’ve come to the realisation that this script is using enormous amounts of memory (the jobs in Condor end up using ~100GB of memory when computing 20-30 new branches in 300k entries files). This happens too in local.

Code structure

In detail, what I’ve done with the code is:

  1. I’ve written a series of .cpp that look like this:
// project/macros/macro.cpp
#include <vector>
vector<float> macro(
    vector<float> value1,
    vector<float> value2
) {
    vector<float> result;
    // compute and push back values for result
    return result;
}
  1. I’ve wrapped the C++ macros with Python functions so I can call them as part of a package:
# project/src/module/macros/macro.py
import ROOT

ROOT.gROOT.LoadMacro('project/macros/macro.cpp')

def macro(tree, *args, **kwargs):
    return ROOT.macro(
        getattr(tree, 'value1'), #branch with vector of floats
        getattr(tree, 'value2')  #branch with vector of floats
    )
  1. I’ve built a Python class that handles dynamic creation of branches from these macros:
# project/src/classes/reTupler.py
class reTupler:
    def __init__(self, tree_name, new_file, src_file):
        self.src_file = ROOT.TFile.Open(src_file)
        self.src_tree = self.src_file.Get(tree_name)

        self.new_file = ROOT.TFile.Open(new_file,'recreate')
        self.new_tree = ROOT.TTree(tree_name, tree_name)

        # To access branches in 'src_tree' from 'new_tree':
        self.new_tree.AddFriend(self.src_tree)

        # To keep track of new branches and store values:
        self.new_branches = {}

    def add_branch(self, name, f, value_type='float'):
        self.new_branches[name] = {}
        self.new_branches[name]['f'] = f
        self.new_branches[name]['name']  = name
        self.new_branches[name]['value_type'] = value_type
        self.new_branches[name]['value']   = value = ROOT.std.vector(value_type)()
        self.new_branches[name]['tbranch'] = self.new_tree.Branch(name, value)

    def run(self):
        nentries = self.src_tree.GetEntries()
        for i in range(nentries):
            # Get entry and make sure src_tree and new_tree are synced
            self.src_tree.GetEntry(i)
            self.new_tree.GetEntry(i)

            # Now loop on all the branches that have been added:
            for branch_name, branch_dict in self.new_branches.items():
                branch_dict['value'].clear()
                [branch_dict['value'].push_back(result) for result in branch_dict['f'](self.new_tree)]

            # Fill entry with all computed branches
            self.new_tree.Fill()

        self.new_tree.Write()
        self.new_file.Close()
        self.src_file.Close()

In my code I distinguish between vector and scalar branches, but here for simplicity I’ve written only the vector case.

Minimal Working Example

I’ll make a minimal working example so anyone can reproduce this issue:

import ROOT

ROOT.gInterpreter.Declare('''
// project/macros/macro.cpp
#include <vector>
vector<float> macro(
    vector<float> value1,
    vector<float> value2
) {
    vector<float> result;
    // Copy values from value2 according to the sign of value1
    // Doing nothing also increases the memory usage...
    for (int i=0; i < value1.size(); i++) {
        if (value1[i] > 0) {result.push_back(value2[i]);};        
    };
    return result;
}
''')

def macro(tree, *args, **kwargs):
    return ROOT.macro(
        getattr(tree, 'value1'), #branch with vector of floats
        getattr(tree, 'value2')  #branch with vector of floats
    )

class reTupler:
    def __init__(self, tree_name, new_file, src_file):
        self.src_file = ROOT.TFile.Open(src_file)
        self.src_tree = self.src_file.Get(tree_name)

        self.new_file = ROOT.TFile.Open(new_file,'recreate')
        self.new_tree = ROOT.TTree(tree_name, tree_name)

        # To access branches in 'src_tree' from 'new_tree':
        self.new_tree.AddFriend(self.src_tree)

        # To keep track of new branches and store values:
        self.new_branches = {}

    def add_branch(self, name, f, value_type='float'):
        self.new_branches[name] = {}
        self.new_branches[name]['f'] = f
        self.new_branches[name]['name']  = name
        self.new_branches[name]['value_type'] = value_type
        self.new_branches[name]['value']   = value = ROOT.std.vector(value_type)()
        self.new_branches[name]['tbranch'] = self.new_tree.Branch(name, value)

    def run(self):
        nentries = self.src_tree.GetEntries()
        for i in range(nentries):
            # Get entry and make sure src_tree and new_tree are synced
            self.src_tree.GetEntry(i)
            self.new_tree.GetEntry(i)

            # Now loop on all the branches that have been added:
            for branch_name, branch_dict in self.new_branches.items():
                branch_dict['value'].clear()
                [branch_dict['value'].push_back(result) for result in branch_dict['f'](self.new_tree)]

            # Fill entry with all computed branches
            self.new_tree.Fill()

        self.new_tree.Write()
        self.new_file.Close()
        self.src_file.Close()
        
tree_name = 'DDTree'
src_path = 'path/to/src.root'
new_path = 'path/to/new.root'

retupler = reTupler('DDTree', new_path, src_path)
retupler.add_branch('new_branch', macro, 'float')

retupler.run()


This is a very reduced version of the code that is similar enough to the one I am currently using and it also presents the memory leak.

For this minimal working example I’ve used a .root with 100k entries and only two branches (‘value1’ and ‘value2’) that I’ve previously filled with 20-element vectors with random numbers from -999 to 999 using numpy.randrom.uniform (numpy.randrom.uniform(-999,999,20)). Just in case, I’ve used the following code:

import ROOT
import numpy as np


file_path = 'path/to/src.root'

file = ROOT.TFile.Open(file_path,'recreate')
tree = ROOT.TTree('DDTree','DDTree')

# Branch: value1
value1_value  = ROOT.std.vector('float')()
value1_branch = tree.Branch('value1',value1_value)

# Branch: value2
value2_value = ROOT.std.vector('float')()
value2_branch = tree.Branch('value2',value2_value)


value_length = 20
nentries = 100000
for i in range(nentries):
    tree.GetEntry(i)
    
    value1_value.clear()
    [value1_value.push_back(result) for result in np.random.uniform(-999,999,value_length)]
    
    value2_value.clear()
    [value2_value.push_back(result) for result in np.random.uniform(-99,999,value_length)]
    
    tree.Fill()

file.Write()
file.Close()

This code doesn’t present a memory leak (thankfully) (:

Fixes I’ve tried (and don’t seem to work)

I have tried different fixes that I’ve gathered from past topics and from recommendations from my colleagues but none seem to solve my issue. The ones I’ve tried so far are:

  • Using self.new_tree.FlushBaskets(); self.src_tree.DropBaskets() in regular invertals through the for loop.
  • Calling the garbage collector with gc.collect() in regular intervals through the for loop.
  • Using self.new_tree.DropBuffers(max_memory); self.src_tree.DropBuffers(max_memory) with different values for max_memory.
  • Setting explicitly the AutoSave with self.new_tree.SetAutoSave(step) where step is a high enough number (~10k) so that the runtime isn’t increased significantly.
  • Setting the address for the branches from self.src_tree creating a dictionary for source branches similar to the one for new branches (self.new_branches) and explicitly saying self.out_branch.SetBranchAddress(name, value).
  • In addition to the last fix, setting a fixed vector length for the branches high enough for the number of elements using SetBranchAddress’s third argument.
  • Using pointers for the macros in C++ instead of the values (can be done with the current code, just needs to re-write macro.cpp to use pointers).
  • Using Python 2 (Python Version: 2.7.5).

None of these solutions have worked so far for me, however, my implementation may not be perfect and some may actually work, this is just a list of the fixes I’ve tried so far.

Conclusion

This is my first time using ROOT, pyROOT and C++ and I may be doing something that is inadvertently causing this memory leak. I’ve been struggling with this for a month and I’d really like to progress with this issue, so any help anyone can provide will be greatly appreciated (:

2 Likes

Dear @martialc,

welcome to the ROOT forum and thank you for this excellent first post! Getting a full reproducer of the problem already in the initial post is golden :smiley:

I have identified the memory leak in PyROOT and it will be fixed in ROOT 6.32, which comes out next week:

In the meantime, the workaround is to avoid using the TTree getattr pythonization, but get the branch data in the Python world manually:

ROOT.gInterpreter.Declare(
    """
template<class T>
void * MyGetAddress(T * b) {
   return *(void**)b->GetAddress();
}
"""
)


def macro(tree, *args, **kwargs):

    import cppyy.ll

    # manually (GetAddress() doesn't work in PyROOT, so I created a C++ wrapper)
    v1 = cppyy.ll.cast[tree.GetBranch("value1").GetClassName() + "*"](ROOT.MyGetAddress(tree.GetBranch("value1")))
    v2 = cppyy.ll.cast[tree.GetBranch("value2").GetClassName() + "*"](ROOT.MyGetAddress(tree.GetBranch("value2")))
    # using the Pythonization
    # v1 = getattr(tree, "value1")
    # v2 = getattr(tree, "value2")
    return None

I hope this helps, and keep up the great work! But don’t iterate over 10000 entries in Python if avoidable :laughing:

Cheers,
Jonas

1 Like

Dearest @jonas,

Yep, that fixed it! Thank you so much for such a prompt response, this issue has been haunting me for months now :sob:

I don’t know enough C++ to understand the dark magic you are using here to solve the issue, but I’d like to understand better the fix if it’s not a bother (:

Also, this was the JIRA issue (https://its.cern.ch/jira/browse/ROOT-9025) that I saw that was related to a memory leak with __getattr__ in past versions of Python (<3.6). I thought this might be the one you were referring to in the pull request (I’ve since seen it’s not, but just in case it’s usefull, here it is (: ).

Thanks again for such a quick fix!! <3

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.