Dear ROOT experts,
In our RDataFrame-based analysis, I am trying to implement a lambda function using RooFit objects with the final goal of performing a fit for every event (while running in multi-thread).
However, when I define RooFormulaVar
objects inside the lambda, the execution slows down significantly. When defining RooRealVar
objects I don’t notice any significant change.
I have tried to replicate the problem with a very simplified formula
#include <TRandom.h>
#include "RooFormulaVar.h"
#include "RooRealVar.h"
#include "ROOT/RDataFrame.hxx"
int main() {
ROOT::EnableThreadSafety();
ROOT::EnableImplicitMT();
auto small_test = [] (double x, double y) {
RooRealVar var0("var0", "var0", x);
RooRealVar var1("var1", "var1", y);
RooFormulaVar formula("formula", "@0 * @1", RooArgList(var0, var1));
return 1;
};
auto df = ROOT::RDataFrame{1000000}.Define("x", []() { return gRandom->Uniform(-5., 5.); }).Define("y", []() {return gRandom->Gaus(1., 3.);});
auto df_f = df.Define("dummy", small_test, {"x", "y"}).Filter("dummy == 1", "Filter");
auto n = df_f.Count();
printf("Entries: %lld\n", *n);
return 0;
}
With 1M entries, it takes 1.17 minutes, while if I comment that line, it takes only 1s. I get similar trends locally, on lxplus and on another cluster.
The time scales up if I define more RooFormulaVar
and if the expression inside is more complex.
I have seen that there is a tutorial to define RooDataSet
from RDataFrame
columns, but I haven’t seen anything on using RooFit
inside RDataFrame
. (Sorry if I have missed it!)
Am I doing something wrong? Is this behavior expected? Can I rewrite the code in a better way to somehow speed up the execution?
In the lambda in my analysis, I have multiple and more complex RooFormulaVar
. This is slowing down significantly the whole execution for milions of events (from tens of seconds to tens of minutes, and I haven’t performed the fit yet).
Thank you!