RVec's memory is not freed by a garbage collector in pyROOT?

FoxWise · March 14, 2024, 7:35pm

Open python3 interactively and type

import ROOT
# check memory usage via htop ~ 520Mb
for i in range(10000):
    rvec = ROOT.RVec('double')((1,2,3))
    # do nothing with it and hope memory is freed

# check memory usage via htop ~ 687Mb

I have a ROOT file with many Rvecs stored.
Iterating over them creates many copies under the hood, increasing memory usage.

data = df.AsNumpy(["my_rvecs"])
for d in data["my_rvecs"]:
        # d is a copy of rvec's stored in the TTree/numpy array
        d = np.asarray(d) # temporary copy of the rvecs, which are not freed in the memory!

Is there any current workaround for this? E.g. using np.nditer?
Or can I manually free the memory?

I couldn’t make it work as of now, so I would appreciate any help.

cheers,
Bohdan

jonas · March 15, 2024, 12:24pm

Hello @FoxWise!

I cannot reproduce the problem with the latest ROOT:

import ROOT
import os, psutil
import gc
import numpy as np

process = psutil.Process()

def print_mem():
   gc.collect() 
   print(process.memory_info().rss)  # in kbytes 

ROOT.gInterpreter.Declare("""
auto create_rvec(unsigned int n) {
    //return std::array<unsigned int, 3>({n, n, n});
    return ROOT::RVec<unsigned int>({n, n, n});
}
""")
df = ROOT.ROOT.RDataFrame(1000).Define("my_rvecs", "create_rvec(rdfentry_)")

# To trigger the event loop before measuring memory
full_array = df.AsNumpy(["my_rvecs"])

print_mem()

for d in full_array["my_rvecs"]:
    d = np.asarray(d)

print_mem()

Output:

449548288
449548288

There is no increase in memory when iterating over the RVecs.

Are you sure you’re measuring the memory usage correctly, and there are not garbage collection effects? Which ROOT version are you using?

However, the first example with an RVec in a loop indeed shows a memory increase also for me:

import ROOT
import os, psutil
import gc
import numpy as np
import matplotlib.pyplot as plt
import tqdm

process = psutil.Process()

# To trigger initialization outside the loop. When first instantiating a given
# class PyROOT caches many things that we don't want to measure.
ROOT.RVec('double')()

def get_mem():
   gc.collect() 
   return process.memory_info().rss  # in kbytes 

n_iter = 40000

times = np.empty(n_iter, dtype=float)

for i in tqdm.tqdm(range(n_iter)):
    times[i] = get_mem() * 1e-6
    ROOT.RVec['double']()

plt.figure()
plt.plot(times)
plt.xlabel("iteration")
plt.ylabel("rss [MB]")
plt.savefig("plot.png")

You can see indeed a jump in memory, but it doesn’t increase linearly with the number of created RVecs:
plot

That’s indeed unexpected, and I don’t know a workaround. Please open a ROOT GitHub issue about this if you need to get it fixed. I won’t do so myself because it’s always better if a user reports an issue, they get higher priority

When producing RVecs in a loop in C++, there is no increase of memory at all. So it could be related to PyROOT indeed. But it must also be related to RVec, because with a std::vector, you don’t see this non-linear increase in memory consumption with PyROOT.

void repro() {

    ProcInfo_t pinfo;
    gSystem->GetProcInfo(&pinfo);
    double initialMem = pinfo.fMemResident;

    for (std::size_t i = 0; i < 10000; ++i) {
        ROOT::RVec<double>{};
    }


    gSystem->GetProcInfo(&pinfo);
    double finalMem = pinfo.fMemResident;

    std::cout << ( finalMem - initialMem ) << std::endl;
}

Cheers,
Jonas

FoxWise · March 15, 2024, 2:00pm

Hi @jonas,

Thanks for the feedback and nice reproducers and plots!

I have run your reproducer, and I do see the problem with ROOT 6.31/01

I am happy to open the GitHub issue, so let’s continue the discussion there.

cheers,
Bohdan

EDIT:
This lives now here:

github.com/root-project/root

RVecs leak memory with np.asarray in pyROOT

opened 02:44PM - 15 Mar 24 UTC

dudarboh

bug

### Check duplicate issues. - [X] Checked for duplicates ### Description Usin…g `np.asarray()` to convert an existing `RVec` object into the numpy array as suggested in the Documentation [here](https://root.cern/doc/master/classROOT_1_1VecOps_1_1RVec.html#RVecdoxyref) takes memory which is never freed. Created `RVec` objects inside pyROOT are also not cleaned up by the garbage collector. ### Reproducer ```bash python --version Python 3.9.12 # Run as python3 reproducer.py ``` ```python #reproducer.py import ROOT import os, psutil import gc import numpy as np import matplotlib.pyplot as plt import tqdm process = psutil.Process() # to cache a lot of things ROOT.RVec('double')() def get_mem_usage(): gc.collect() return process.memory_info().rss # in kbytes ROOT.gInterpreter.Declare(""" auto create_rvec(unsigned int n) { //return std::array<unsigned int, 3>({n, n, n}); return ROOT::RVec<unsigned int>({n, n, n}); } """) df = ROOT.ROOT.RDataFrame(1000).Define("my_rvecs", "create_rvec(rdfentry_)") # To trigger the event loop before measuring memory my_rvecs = df.AsNumpy(["my_rvecs"])["my_rvecs"] mem0 = get_mem_usage() print(mem0) #my_data is gone at the end of the loop, nice! for rvec in my_rvecs: my_data = rvec mem1 = get_mem_usage() print(mem1) #Does not free memory at the end of the loop! for rvec in my_rvecs: my_data = np.asarray(rvec) mem2 = get_mem_usage() print(mem2) #Creating RVecs in place also does not free memory! for i in range(1000): ROOT.RVec('double')((42., 42., 42.)) mem3 = get_mem_usage() print(mem3) ``` P.S. Thanks, Jonas for the the nice reproducer! ### ROOT version ROOT Version: 6.31/01 Built for linuxx8664gcc on Mar 11 2024, 23:52:46 From heads/master@v6-31-01-1407-gc76b8f7 ### Installation method source /cvmfs/sft-nightlies.cern.ch/lcg/views/dev3/latest/x86_64-centos7-gcc11-dbg/setup.sh ### Operating system CentOS Linux 7 (Core) ### Additional context Initially submitted on the forum [here](https://root-forum.cern.ch/t/rvecs-memory-is-not-freed-by-a-garbage-collector-in-pyroot/58545/2)

system · March 29, 2024, 2:00pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.