RDataFrame execution slows down with RooFormulaVar

Dear ROOT experts,

In our RDataFrame-based analysis, I am trying to implement a lambda function using RooFit objects with the final goal of performing a fit for every event (while running in multi-thread).

However, when I define RooFormulaVar objects inside the lambda, the execution slows down significantly. When defining RooRealVar objects I don’t notice any significant change.

I have tried to replicate the problem with a very simplified formula

#include <TRandom.h>
#include "RooFormulaVar.h"
#include "RooRealVar.h"
#include "ROOT/RDataFrame.hxx"

int main() {
        ROOT::EnableThreadSafety();
        ROOT::EnableImplicitMT();
        auto small_test = [] (double x, double y) {
               RooRealVar var0("var0", "var0", x);
               RooRealVar var1("var1", "var1", y);
               RooFormulaVar formula("formula", "@0 * @1", RooArgList(var0, var1));
               return 1;
        };
        auto df = ROOT::RDataFrame{1000000}.Define("x", []() { return gRandom->Uniform(-5., 5.); }).Define("y", []() {return gRandom->Gaus(1., 3.);});
        auto df_f = df.Define("dummy", small_test, {"x", "y"}).Filter("dummy == 1", "Filter");
        auto n = df_f.Count();
        printf("Entries: %lld\n", *n);
        return 0;
}

With 1M entries, it takes 1.17 minutes, while if I comment that line, it takes only 1s. I get similar trends locally, on lxplus and on another cluster.

The time scales up if I define more RooFormulaVar and if the expression inside is more complex.

I have seen that there is a tutorial to define RooDataSet from RDataFrame columns, but I haven’t seen anything on using RooFit inside RDataFrame. (Sorry if I have missed it!)

Am I doing something wrong? Is this behavior expected? Can I rewrite the code in a better way to somehow speed up the execution?

In the lambda in my analysis, I have multiple and more complex RooFormulaVar. This is slowing down significantly the whole execution for milions of events (from tens of seconds to tens of minutes, and I haven’t performed the fit yet).

Thank you!

Hi Francesco,

Thanks for the interesting post and welcome to the ROOT Community!

Your reproducer is well done: it was possible to slightly modify it to give you the performance you are looking for. This is a possible implementation:

struct MyRooFormulaVarWrapper {
   RooRealVar    var0;
   RooRealVar    var1;
   RooFormulaVar formula;
   MyRooFormulaVarWrapper():var0("var0", "var0", 0), var1("var1", "var1", 0), formula("formula", "@0 * @1", RooArgList(var0, var1)){};
};

void foo()
{

   ROOT::EnableImplicitMT();

   std::vector<MyRooFormulaVarWrapper> wrappers(ROOT::GetThreadPoolSize());
   auto small_test = [&wrappers](unsigned int islot, double x, double y) {
      auto& iwrapper =  wrappers[islot];
      iwrapper.var0.setVal(x);
      iwrapper.var0.setVal(y);
      // do something with iwrapper.formula ....
      return 1;
   };

   auto df   = ROOT::RDataFrame{1000000}.Define("x", []() { return gRandom->Uniform(-5., 5.); }).Define("y", []() {
      return gRandom->Gaus(1., 3.);
   });
   auto df_f = df.DefineSlot("dummy", small_test, {"x", "y"}).Filter("dummy == 1", "Filter");
   auto n    = df_f.Count();
   printf("Entries: %lld\n", *n);
}

This avoids to have one initialisation of the formula per event, in favour of one formula per slot, which is a concept RDataFrame leverages exactly to handle cases like this one.

I hope this helps!

Cheers,
D