RDataFrame execution slows down with RooFormulaVar

Dear ROOT experts,

In our RDataFrame-based analysis, I am trying to implement a lambda function using RooFit objects with the final goal of performing a fit for every event (while running in multi-thread).

However, when I define RooFormulaVar objects inside the lambda, the execution slows down significantly. When defining RooRealVar objects I don’t notice any significant change.

I have tried to replicate the problem with a very simplified formula

#include <TRandom.h>
#include "RooFormulaVar.h"
#include "RooRealVar.h"
#include "ROOT/RDataFrame.hxx"

int main() {
        ROOT::EnableThreadSafety();
        ROOT::EnableImplicitMT();
        auto small_test = [] (double x, double y) {
               RooRealVar var0("var0", "var0", x);
               RooRealVar var1("var1", "var1", y);
               RooFormulaVar formula("formula", "@0 * @1", RooArgList(var0, var1));
               return 1;
        };
        auto df = ROOT::RDataFrame{1000000}.Define("x", []() { return gRandom->Uniform(-5., 5.); }).Define("y", []() {return gRandom->Gaus(1., 3.);});
        auto df_f = df.Define("dummy", small_test, {"x", "y"}).Filter("dummy == 1", "Filter");
        auto n = df_f.Count();
        printf("Entries: %lld\n", *n);
        return 0;
}

With 1M entries, it takes 1.17 minutes, while if I comment that line, it takes only 1s. I get similar trends locally, on lxplus and on another cluster.

The time scales up if I define more RooFormulaVar and if the expression inside is more complex.

I have seen that there is a tutorial to define RooDataSet from RDataFrame columns, but I haven’t seen anything on using RooFit inside RDataFrame. (Sorry if I have missed it!)

Am I doing something wrong? Is this behavior expected? Can I rewrite the code in a better way to somehow speed up the execution?

In the lambda in my analysis, I have multiple and more complex RooFormulaVar. This is slowing down significantly the whole execution for milions of events (from tens of seconds to tens of minutes, and I haven’t performed the fit yet).

Thank you!

Hi Francesco,

Thanks for the interesting post and welcome to the ROOT Community!

Your reproducer is well done: it was possible to slightly modify it to give you the performance you are looking for. This is a possible implementation:

struct MyRooFormulaVarWrapper {
   RooRealVar    var0;
   RooRealVar    var1;
   RooFormulaVar formula;
   MyRooFormulaVarWrapper():var0("var0", "var0", 0), var1("var1", "var1", 0), formula("formula", "@0 * @1", RooArgList(var0, var1)){};
};

void foo()
{

   ROOT::EnableImplicitMT();

   std::vector<MyRooFormulaVarWrapper> wrappers(ROOT::GetThreadPoolSize());
   auto small_test = [&wrappers](unsigned int islot, double x, double y) {
      auto& iwrapper =  wrappers[islot];
      iwrapper.var0.setVal(x);
      iwrapper.var0.setVal(y);
      // do something with iwrapper.formula ....
      return 1;
   };

   auto df   = ROOT::RDataFrame{1000000}.Define("x", []() { return gRandom->Uniform(-5., 5.); }).Define("y", []() {
      return gRandom->Gaus(1., 3.);
   });
   auto df_f = df.DefineSlot("dummy", small_test, {"x", "y"}).Filter("dummy == 1", "Filter");
   auto n    = df_f.Count();
   printf("Entries: %lld\n", *n);
}

This avoids to have one initialisation of the formula per event, in favour of one formula per slot, which is a concept RDataFrame leverages exactly to handle cases like this one.

I hope this helps!

Cheers,
D

Hi Danilo,

Thank you very much! Indeed, it works, but when I try to implement it in our framework, I face another problem leading to a break segmentation violation. Our framework is basically based on the RNode tutorial (link): from a single function we call all the other functions defining and filtering the dataframe.
I have tried to replicate the problem with the simplified reproducer.

#include <TRandom.h>
#include "RooFormulaVar.h"
#include "RooRealVar.h"
#include "ROOT/RDataFrame.hxx"
using RNode = ROOT::RDF::RNode;

struct MyRooFormulaVarWrapper {
        RooRealVar    var0;
        RooRealVar    var1;
        RooFormulaVar formula;
        MyRooFormulaVarWrapper():var0("var0", "var0", 0), var1("var1", "var1", 0), formula("formula", "@0 * @1", RooArgList(var0, var1)){};
};


RNode apply_formula(RNode df)
{
        std::vector<MyRooFormulaVarWrapper> wrappers(ROOT::GetThreadPoolSize());
        auto small_test = [&wrappers](unsigned int islot, double x, double y) {
                auto& iwrapper =  wrappers[islot];
                iwrapper.var0.setVal(x);
                iwrapper.var0.setVal(y);
                return 1;
        };
        return df.DefineSlot("dummy", small_test, {"x", "y"}).Filter("dummy == 1", "Filter");
}


int main()
{
        ROOT::EnableImplicitMT();
        auto df = ROOT::RDataFrame(1000000).Define("x", "10.").Define("y","10.");
        auto df_f = apply_formula(df);
        auto n = df_f.Count();
        printf("Entries: %lld\n", *n);
        return 0;
}

If I try to pass the DataFrame to an external function, I get the error
*** Break *** segmentation violation munmap_chunk(): invalid pointer (core dumped)
Where am I messing up with the memory (and/or slots)? How can I implement it? I have tried different solutions but I didn’t manage.
Thanks!

Hi Francesco,

The problem seems to be completely unrelated to ROOT.
In your program, the wrappers variable is defined in the apply_formula function, and, as such, it is deleted at the end of the scope of that function.
A possibility could be to define it in main and pass it by reference to apply_formula or anything that ensures that its lifetime spans beyond apply_formula.

Cheers,
D

Ooops my bad, thank you! I got confused because in my framework (on top of the other things) I was also passing it by value and not by reference.

I still have one problem with fitTo. From what I have understood from the documentation, it should be safe with multithread (ROOT::ROOT::EnableImplicitMT()) and it should be the default (EvalBackend("cpu")). However with the implementation below I still get Segmentation fault (core dumped). Is my lambda thread-unsafe? It seems that if I specify only one slot or I comment the fitTo line it runs without problems. I think it might be related to the owning pointer that should be created after the fit but I’ve tried without success to delete it.
I am using ROOT 6.31/01.

#include <TRandom.h>
#include "RooFormulaVar.h"
#include "RooRealVar.h"
#include "ROOT/RDataFrame.hxx"
using RNode = ROOT::RDF::RNode;
using namespace RooFit;
#include "RooGaussian.h"
#include "RooMinimizer.h"
#include "RooFitResult.h"

struct MyRooFormulaVarWrapper {
        RooRealVar    var0;
        RooRealVar    var1;
        RooRealVar    m;
        RooRealVar    s;
        RooDataSet    data;
        RooGaussian  model;
        MyRooFormulaVarWrapper():var0("var0", "var0", 0), var1("var1", "var1", 0), m("m", "m", 10, 8, 12), s("s", "s", 1, 0.1, 2), data("data", "data", var0), model("model", "model", var0, m, s) {};
};


RNode apply_formula(RNode df, std::vector<MyRooFormulaVarWrapper>& wrappers)
{
        auto small_test = [&wrappers](unsigned int islot, double x0, double x1, double x2, double x3, double x4) {
                auto& iwrapper =  wrappers[islot];
                iwrapper.var1.setVal(x1);
                iwrapper.data.reset();
                iwrapper.var0.setVal(x0); iwrapper.data.add(iwrapper.var0);
                iwrapper.var0.setVal(x1); iwrapper.data.add(iwrapper.var0);
                iwrapper.var0.setVal(x2); iwrapper.data.add(iwrapper.var0);
                iwrapper.var0.setVal(x3); iwrapper.data.add(iwrapper.var0);
                iwrapper.var0.setVal(x4); iwrapper.data.add(iwrapper.var0);
                iwrapper.model.fitTo(iwrapper.data, EvalBackend("cpu"), PrintLevel(3));// PrintLevel(3));
                return 1;
        };

        return df.DefineSlot("dummy", small_test, {"x0", "x1", "x2", "x3", "x4"}).Filter("dummy==1");
}


int main()
{
        ROOT::EnableImplicitMT();
        std::vector<MyRooFormulaVarWrapper> wrappers(ROOT::GetThreadPoolSize());
        auto gaus = [](){return gRandom->Gaus(10,1);};
        auto df = ROOT::RDataFrame{100}.Define("x0", gaus).Define("x1", gaus).Define("x2", gaus).Define("x3", gaus).Define("x4", gaus);
        auto df_f = apply_formula(df, wrappers);
        auto n = df_f.Count();
        printf("Entries: %lld\n", *n);
        return 0;
}

Thanks
F.

Hi Francesco,

Let me add @jonas in the loop to comment about the thread safety.

Cheers,
D

Hi @forlandi!

Evaluating RooFit functions is indeed threadsafe. For the CPU backend it requires a small fix where I will open a PR immediately.

However, any fitting is absolutely not threadsafe, because the RooMinimizer uses a static ROOT fitter object under to hood:

Making the RooMinimizer thread safe will be a major effort.
If this is a requirement for your analysis, please open a ROOT improvement ticket on GitHub:

It would also help us to prioritize these developments if you can explain your usecase a bit.

Until then, there is nothing I can suggest to you other than putting a lock around your fitTo so it always runs sequentially.

Hi @jonas, thank you very much.
We have some other missing minor functionalities that would help us develop our analysis in RDataFrame. I can probably group them all together and get in touch with you to show the context and the usecase?