RVec's memory is not freed by a garbage collector in pyROOT?

Open python3 interactively and type

import ROOT
# check memory usage via htop ~ 520Mb
for i in range(10000):
    rvec = ROOT.RVec('double')((1,2,3))
    # do nothing with it and hope memory is freed

# check memory usage via htop ~ 687Mb

I have a ROOT file with many Rvecs stored.
Iterating over them creates many copies under the hood, increasing memory usage.

data = df.AsNumpy(["my_rvecs"])
for d in data["my_rvecs"]:
        # d is a copy of rvec's stored in the TTree/numpy array
        d = np.asarray(d) # temporary copy of the rvecs, which are not freed in the memory!

Is there any current workaround for this? E.g. using np.nditer?
Or can I manually free the memory?

I couldn’t make it work as of now, so I would appreciate any help.

cheers,
Bohdan

Hello @FoxWise!

I cannot reproduce the problem with the latest ROOT:

import ROOT
import os, psutil
import gc
import numpy as np

process = psutil.Process()

def print_mem():
   gc.collect() 
   print(process.memory_info().rss)  # in kbytes 

ROOT.gInterpreter.Declare("""
auto create_rvec(unsigned int n) {
    //return std::array<unsigned int, 3>({n, n, n});
    return ROOT::RVec<unsigned int>({n, n, n});
}
""")
df = ROOT.ROOT.RDataFrame(1000).Define("my_rvecs", "create_rvec(rdfentry_)")

# To trigger the event loop before measuring memory
full_array = df.AsNumpy(["my_rvecs"])

print_mem()

for d in full_array["my_rvecs"]:
    d = np.asarray(d)

print_mem()

Output:

449548288
449548288

There is no increase in memory when iterating over the RVecs.

Are you sure you’re measuring the memory usage correctly, and there are not garbage collection effects? Which ROOT version are you using?

However, the first example with an RVec in a loop indeed shows a memory increase also for me:

import ROOT
import os, psutil
import gc
import numpy as np
import matplotlib.pyplot as plt
import tqdm

process = psutil.Process()

# To trigger initialization outside the loop. When first instantiating a given
# class PyROOT caches many things that we don't want to measure.
ROOT.RVec('double')()

def get_mem():
   gc.collect() 
   return process.memory_info().rss  # in kbytes 

n_iter = 40000

times = np.empty(n_iter, dtype=float)

for i in tqdm.tqdm(range(n_iter)):
    times[i] = get_mem() * 1e-6
    ROOT.RVec['double']()

plt.figure()
plt.plot(times)
plt.xlabel("iteration")
plt.ylabel("rss [MB]")
plt.savefig("plot.png")

You can see indeed a jump in memory, but it doesn’t increase linearly with the number of created RVecs:
plot

That’s indeed unexpected, and I don’t know a workaround. Please open a ROOT GitHub issue about this if you need to get it fixed. I won’t do so myself because it’s always better if a user reports an issue, they get higher priority :slight_smile:

When producing RVecs in a loop in C++, there is no increase of memory at all. So it could be related to PyROOT indeed. But it must also be related to RVec, because with a std::vector, you don’t see this non-linear increase in memory consumption with PyROOT.

void repro() {

    ProcInfo_t pinfo;
    gSystem->GetProcInfo(&pinfo);
    double initialMem = pinfo.fMemResident;

    for (std::size_t i = 0; i < 10000; ++i) {
        ROOT::RVec<double>{};
    }


    gSystem->GetProcInfo(&pinfo);
    double finalMem = pinfo.fMemResident;

    std::cout << ( finalMem - initialMem ) << std::endl;
}
0

Cheers,
Jonas

1 Like

Hi @jonas,

Thanks for the feedback and nice reproducers and plots!

I have run your reproducer, and I do see the problem with ROOT 6.31/01

I am happy to open the GitHub issue, so let’s continue the discussion there.

cheers,
Bohdan

EDIT:
This lives now here:

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.