Hi,
I’ve encountered a small but significant memory leak with RDataFrame.AsNumpy() in ROOT 6.38.00. I have a tree with the largest branch being a 3D vector trace_ch: vector<vector<vector>>. The following code:
ROOT.gInterpreter.GenerateDictionary("ROOT::VecOps::RVec<vector<vector<float>>>", "vector;ROOT/RVec.hxx")
df = ROOT.RDataFrame(dd.trawvoltage._tree)
for ij in range(np.ceil(dd.trawvoltage.get_entries()/events_to_read).astype(int)):
print(ij)
df.Range(ij * events_to_read, (ij + 1) * events_to_read).AsNumpy(["trace_ch"])
continue
leaks memory. In my tests I was reading 3000 events at time, with 5x4x1024 dimensions of the single entry vector on average (the first dimension varies, but it is 5 on average). So a single iteration should read out ~246 MB in C++ terms. After about 300 iterations, the code uses 1.1 GB more than at the first iteration. The growth is not linear - sometimes it keeps almost constant for several dozen iterations, then grows quickly. Perhaps it is something caching related?
A C++ code called from Python, where I pass vectors, and in C++ do the iterating and filling works without any memory leaks.
I know I should provide a working example, but… I am uncertain how could I give you as much data as is needed to notice the leak…
ROOT Version: 6.38.00
Platform: Fedora 43