Memory consumption with string Define() in RDataFrame (PyROOT)


ROOT Version: 6.34.04 (installed via Homebrew)
Platform: macOS Sequoia 15.3.2
Compiler: Not Provided


Dear experts,

I’m currently trying to produce some TTrees with RDataFrame in PyROOT using simple defines, and when I run memray on the code below I see the following


Is this behavior expected, or am I missing something?

The reproducer is below

import ROOT as rt

rt.gInterpreter.Declare("""
ROOT::RVecF generate_random_vec(int size) {
    ROOT::RVecF result;
    result.reserve(size);
    for(int i = 0; i < size; ++i) {
        result.push_back(gRandom->Uniform(-100, 100));
    }
    return result;
}
""")

df = rt.RDataFrame(5000000)
rt.RDF.Experimental.AddProgressBar(df)

df = (
    df.Define("trk_a", "generate_random_vec(200)")
      .Define("trk_b", "generate_random_vec(200)")
      .Define("var_c", "generate_random_vec(10)")
)

df.Snapshot("ntuple1", "output1.root")

df_loaded = rt.RDataFrame("ntuple1", "output1.root")
rt.RDF.Experimental.AddProgressBar(df_loaded)

df_loaded = df_loaded.Define("trk1", "abs(trk_a+trk_b)")
df_loaded = df_loaded.Define("trk2", "abs(trk_a+trk_b)")
df_loaded = df_loaded.Define("trk3", "abs(trk_a+trk_b)")
df_loaded = df_loaded.Define("sqrt_c", "var_c*var_c")

df_loaded.Snapshot("ntuple2", "output2.root")

Thank you for your time in advance,
Pavel

I think @vpadulan can help with this question

Dear @pavpav ,

Thanks for reaching out to the forum! And thanks for the nice reproducer, I will give this a look. I would say the behaviour you report is not expected. memray is a great tool, let’s see if I can track the problem down with it.

Cheers,
Vincenzo

2 Likes