A memory leak with RDataFrame.AsNumpy() and vector<vector<vector<float>>>

Hi,

I’ve encountered a small but significant memory leak with RDataFrame.AsNumpy() in ROOT 6.38.00. I have a tree with the largest branch being a 3D vector trace_ch: vector<vector<vector>>. The following code:

ROOT.gInterpreter.GenerateDictionary("ROOT::VecOps::RVec<vector<vector<float>>>", "vector;ROOT/RVec.hxx")
df = ROOT.RDataFrame(dd.trawvoltage._tree)
for ij in range(np.ceil(dd.trawvoltage.get_entries()/events_to_read).astype(int)):
    print(ij)
    df.Range(ij * events_to_read, (ij + 1) * events_to_read).AsNumpy(["trace_ch"])
    continue

leaks memory. In my tests I was reading 3000 events at time, with 5x4x1024 dimensions of the single entry vector on average (the first dimension varies, but it is 5 on average). So a single iteration should read out ~246 MB in C++ terms. After about 300 iterations, the code uses 1.1 GB more than at the first iteration. The growth is not linear - sometimes it keeps almost constant for several dozen iterations, then grows quickly. Perhaps it is something caching related?

A C++ code called from Python, where I pass vectors, and in C++ do the iterating and filling works without any memory leaks.

I know I should provide a working example, but… I am uncertain how could I give you as much data as is needed to notice the leak…


ROOT Version: 6.38.00
Platform: Fedora 43


Dear @LeWhoo ,

Thanks for reaching out! Could you provide one input data file so I could start debugging from there?

Cheers,
Vincenzo

Thank you! I shared them with you in a private message.

What I forgot to mention is that I read them as a TChain (the RDataFrame was initialised with a TChain) - perhaps this has something to do with the problem.