RDataFrame Foreach causing memory leak

Danilo · January 22, 2019, 9:19pm

Ciao Davide,

thanks for taking the time to share such a clear example!
Short answer. The solution is to replace your jitted filter by a compiled one, i.e. change this

df.Filter("(nPU=="+ to_string(pu) +")&&(signalTruth=="+to_string(s) +")")

to this

df.Filter([pu,s](int nPU, double signalTruth){return nPU == pu && signalTruth == s;}, {"nPU", "signalTruth"})

As a side effect, the performance of your program will also increase.

Longer answer The behaviour you stumbled on is not a memory leak, but rather a memory hog. This is due to the fact that for each of the pairs, a new filter was just-in-time compiled. Every time cling, ROOT’s interpreter, compiles some code, some entites are put in memory, for example the compiled binaries, the nodes of the AST (clang representation of code).

We are aware of this and working towards a solution.

I also prototyped thre alternatives for you, minimally modifying your code. For me alternative 3) was by far the fastest (it runs in parallel…)

Cache the full dataset in memory once, then run on it N times (as a macro):

using namespace ROOT; 
using namespace ROOT::RDF;
using namespace ROOT::VecOps;


void dostuff(int nPU, double E_pu, double signalTruth,double amplitudeTruth, RVec<double>digis){
     cout << nPU << " " << E_pu << " " << signalTruth <<endl;
}   

int test(){
    auto df = RDataFrame("weights", "test.root");

    vector<int> PUs = {0, 10, 20, 30, 40, 50, 60, 70, 80, 90 ,100};
    vector<float> Ss = {10.,12.,14.,16.,18.,20.,30.,40.,50.,60.,70};

    // Cache in memory the dataset
    auto dfCached = df.Cache();

    for (int pu: PUs){
        for (float s: Ss){
            cout << "PU:"<< pu << " | S:" << s <<endl;

            dfCached.Filter([pu,s](int nPU, double signalTruth){return nPU == pu && signalTruth == s;}, {"nPU", "signalTruth"})
                    .Foreach(dostuff, {"nPU", "E_pu", "signalTruth", "amplitudeTruth", "digis"});
        }            
    }
    return 0;
        
}

Reorder the loops (not sure it’s really possible in your case) and run on the data once:

using namespace ROOT; 
using namespace ROOT::RDF;
using namespace ROOT::VecOps;

vector<int> PUs = {0, 10, 20, 30, 40, 50, 60, 70, 80, 90 ,100};
vector<float> Ss = {10.,12.,14.,16.,18.,20.,30.,40.,50.,60.,70};

void dostuff(int nPU, double E_pu, double signalTruth,double amplitudeTruth, RVec<double> &digis){
   cout << nPU << " " << E_pu << " " << signalTruth <<endl;
}

void dostuff2(int nPU, double E_pu, double signalTruth,double amplitudeTruth, RVec<double> &digis){
    for (int pu: PUs){
        for (float s: Ss){
           dostuff( nPU, E_pu, signalTruth, amplitudeTruth, digis); // The original function :)
        }
    }
}   

int test(){
    auto df = RDataFrame("weights", "test.root");

    // Cache in memory the dataset
    df.Foreach(dostuff2, {"nPU", "E_pu", "signalTruth", "amplitudeTruth", "digis"});
        
    return 0;
}

Like 1), but runs in parallel. Note the new signature of dostuff. That implies that an integer is passed to it which represents the worker id running the function (it is more or less what in parallel CMSSW are called ‘Streams’)

using namespace ROOT; 
using namespace ROOT::RDF;
using namespace ROOT::VecOps;

// Note the first parameter!!!
void dostuff(unsigned int slot, int nPU, double E_pu, double signalTruth,double amplitudeTruth, RVec<double>digis){
  //   cout << nPU << " " << E_pu << " " << signalTruth <<endl;
}   

int test(){

    ROOT::EnableImplicitMT();

    auto df = RDataFrame("weights", "test.root");

    vector<int> PUs = {0, 10, 20, 30, 40, 50, 60, 70, 80, 90 ,100};
    vector<float> Ss = {10.,12.,14.,16.,18.,20.,30.,40.,50.,60.,70};

    // Cache in memory the dataset
    auto dfCached = df.Cache();
    
    for (int pu: PUs){
        for (float s: Ss){
            cout << "PU:"<< pu << " | S:" << s <<endl;

            dfCached.Filter([pu,s](int nPU, double signalTruth){return nPU == pu && signalTruth == s;}, {"nPU", "signalTruth"})
                    .ForeachSlot(dostuff, {"nPU", "E_pu", "signalTruth", "amplitudeTruth", "digis"});
        }            
    }
    return 0;
}

It looks like a lot, but all three should be very easily testable.

Cheers,
D