RDataFrame execution slows down with RooFormulaVar

Dear ROOT experts,

In our RDataFrame-based analysis, I am trying to implement a lambda function using RooFit objects with the final goal of performing a fit for every event (while running in multi-thread).

However, when I define RooFormulaVar objects inside the lambda, the execution slows down significantly. When defining RooRealVar objects I don’t notice any significant change.

I have tried to replicate the problem with a very simplified formula

#include <TRandom.h>
#include "RooFormulaVar.h"
#include "RooRealVar.h"
#include "ROOT/RDataFrame.hxx"

int main() {
        ROOT::EnableThreadSafety();
        ROOT::EnableImplicitMT();
        auto small_test = [] (double x, double y) {
               RooRealVar var0("var0", "var0", x);
               RooRealVar var1("var1", "var1", y);
               RooFormulaVar formula("formula", "@0 * @1", RooArgList(var0, var1));
               return 1;
        };
        auto df = ROOT::RDataFrame{1000000}.Define("x", []() { return gRandom->Uniform(-5., 5.); }).Define("y", []() {return gRandom->Gaus(1., 3.);});
        auto df_f = df.Define("dummy", small_test, {"x", "y"}).Filter("dummy == 1", "Filter");
        auto n = df_f.Count();
        printf("Entries: %lld\n", *n);
        return 0;
}

With 1M entries, it takes 1.17 minutes, while if I comment that line, it takes only 1s. I get similar trends locally, on lxplus and on another cluster.

The time scales up if I define more RooFormulaVar and if the expression inside is more complex.

I have seen that there is a tutorial to define RooDataSet from RDataFrame columns, but I haven’t seen anything on using RooFit inside RDataFrame. (Sorry if I have missed it!)

Am I doing something wrong? Is this behavior expected? Can I rewrite the code in a better way to somehow speed up the execution?

In the lambda in my analysis, I have multiple and more complex RooFormulaVar. This is slowing down significantly the whole execution for milions of events (from tens of seconds to tens of minutes, and I haven’t performed the fit yet).

Thank you!

Hi Francesco,

Thanks for the interesting post and welcome to the ROOT Community!

Your reproducer is well done: it was possible to slightly modify it to give you the performance you are looking for. This is a possible implementation:

struct MyRooFormulaVarWrapper {
   RooRealVar    var0;
   RooRealVar    var1;
   RooFormulaVar formula;
   MyRooFormulaVarWrapper():var0("var0", "var0", 0), var1("var1", "var1", 0), formula("formula", "@0 * @1", RooArgList(var0, var1)){};
};

void foo()
{

   ROOT::EnableImplicitMT();

   std::vector<MyRooFormulaVarWrapper> wrappers(ROOT::GetThreadPoolSize());
   auto small_test = [&wrappers](unsigned int islot, double x, double y) {
      auto& iwrapper =  wrappers[islot];
      iwrapper.var0.setVal(x);
      iwrapper.var0.setVal(y);
      // do something with iwrapper.formula ....
      return 1;
   };

   auto df   = ROOT::RDataFrame{1000000}.Define("x", []() { return gRandom->Uniform(-5., 5.); }).Define("y", []() {
      return gRandom->Gaus(1., 3.);
   });
   auto df_f = df.DefineSlot("dummy", small_test, {"x", "y"}).Filter("dummy == 1", "Filter");
   auto n    = df_f.Count();
   printf("Entries: %lld\n", *n);
}

This avoids to have one initialisation of the formula per event, in favour of one formula per slot, which is a concept RDataFrame leverages exactly to handle cases like this one.

I hope this helps!

Cheers,
D

Hi Danilo,

Thank you very much! Indeed, it works, but when I try to implement it in our framework, I face another problem leading to a break segmentation violation. Our framework is basically based on the RNode tutorial (link): from a single function we call all the other functions defining and filtering the dataframe.
I have tried to replicate the problem with the simplified reproducer.

#include <TRandom.h>
#include "RooFormulaVar.h"
#include "RooRealVar.h"
#include "ROOT/RDataFrame.hxx"
using RNode = ROOT::RDF::RNode;

struct MyRooFormulaVarWrapper {
        RooRealVar    var0;
        RooRealVar    var1;
        RooFormulaVar formula;
        MyRooFormulaVarWrapper():var0("var0", "var0", 0), var1("var1", "var1", 0), formula("formula", "@0 * @1", RooArgList(var0, var1)){};
};


RNode apply_formula(RNode df)
{
        std::vector<MyRooFormulaVarWrapper> wrappers(ROOT::GetThreadPoolSize());
        auto small_test = [&wrappers](unsigned int islot, double x, double y) {
                auto& iwrapper =  wrappers[islot];
                iwrapper.var0.setVal(x);
                iwrapper.var0.setVal(y);
                return 1;
        };
        return df.DefineSlot("dummy", small_test, {"x", "y"}).Filter("dummy == 1", "Filter");
}


int main()
{
        ROOT::EnableImplicitMT();
        auto df = ROOT::RDataFrame(1000000).Define("x", "10.").Define("y","10.");
        auto df_f = apply_formula(df);
        auto n = df_f.Count();
        printf("Entries: %lld\n", *n);
        return 0;
}

If I try to pass the DataFrame to an external function, I get the error
*** Break *** segmentation violation munmap_chunk(): invalid pointer (core dumped)
Where am I messing up with the memory (and/or slots)? How can I implement it? I have tried different solutions but I didn’t manage.
Thanks!

Hi Francesco,

The problem seems to be completely unrelated to ROOT.
In your program, the wrappers variable is defined in the apply_formula function, and, as such, it is deleted at the end of the scope of that function.
A possibility could be to define it in main and pass it by reference to apply_formula or anything that ensures that its lifetime spans beyond apply_formula.

Cheers,
D

Ooops my bad, thank you! I got confused because in my framework (on top of the other things) I was also passing it by value and not by reference.

I still have one problem with fitTo. From what I have understood from the documentation, it should be safe with multithread (ROOT::ROOT::EnableImplicitMT()) and it should be the default (EvalBackend("cpu")). However with the implementation below I still get Segmentation fault (core dumped). Is my lambda thread-unsafe? It seems that if I specify only one slot or I comment the fitTo line it runs without problems. I think it might be related to the owning pointer that should be created after the fit but I’ve tried without success to delete it.
I am using ROOT 6.31/01.

#include <TRandom.h>
#include "RooFormulaVar.h"
#include "RooRealVar.h"
#include "ROOT/RDataFrame.hxx"
using RNode = ROOT::RDF::RNode;
using namespace RooFit;
#include "RooGaussian.h"
#include "RooMinimizer.h"
#include "RooFitResult.h"

struct MyRooFormulaVarWrapper {
        RooRealVar    var0;
        RooRealVar    var1;
        RooRealVar    m;
        RooRealVar    s;
        RooDataSet    data;
        RooGaussian  model;
        MyRooFormulaVarWrapper():var0("var0", "var0", 0), var1("var1", "var1", 0), m("m", "m", 10, 8, 12), s("s", "s", 1, 0.1, 2), data("data", "data", var0), model("model", "model", var0, m, s) {};
};


RNode apply_formula(RNode df, std::vector<MyRooFormulaVarWrapper>& wrappers)
{
        auto small_test = [&wrappers](unsigned int islot, double x0, double x1, double x2, double x3, double x4) {
                auto& iwrapper =  wrappers[islot];
                iwrapper.var1.setVal(x1);
                iwrapper.data.reset();
                iwrapper.var0.setVal(x0); iwrapper.data.add(iwrapper.var0);
                iwrapper.var0.setVal(x1); iwrapper.data.add(iwrapper.var0);
                iwrapper.var0.setVal(x2); iwrapper.data.add(iwrapper.var0);
                iwrapper.var0.setVal(x3); iwrapper.data.add(iwrapper.var0);
                iwrapper.var0.setVal(x4); iwrapper.data.add(iwrapper.var0);
                iwrapper.model.fitTo(iwrapper.data, EvalBackend("cpu"), PrintLevel(3));// PrintLevel(3));
                return 1;
        };

        return df.DefineSlot("dummy", small_test, {"x0", "x1", "x2", "x3", "x4"}).Filter("dummy==1");
}


int main()
{
        ROOT::EnableImplicitMT();
        std::vector<MyRooFormulaVarWrapper> wrappers(ROOT::GetThreadPoolSize());
        auto gaus = [](){return gRandom->Gaus(10,1);};
        auto df = ROOT::RDataFrame{100}.Define("x0", gaus).Define("x1", gaus).Define("x2", gaus).Define("x3", gaus).Define("x4", gaus);
        auto df_f = apply_formula(df, wrappers);
        auto n = df_f.Count();
        printf("Entries: %lld\n", *n);
        return 0;
}

Thanks
F.

Hi Francesco,

Let me add @jonas in the loop to comment about the thread safety.

Cheers,
D

Hi @forlandi!

Evaluating RooFit functions is indeed threadsafe. For the CPU backend it requires a small fix where I will open a PR immediately.

However, any fitting is absolutely not threadsafe, because the RooMinimizer uses a static ROOT fitter object under to hood:

Making the RooMinimizer thread safe will be a major effort.
If this is a requirement for your analysis, please open a ROOT improvement ticket on GitHub:

It would also help us to prioritize these developments if you can explain your usecase a bit.

Until then, there is nothing I can suggest to you other than putting a lock around your fitTo so it always runs sequentially.

Hi @jonas, thank you very much.
We have some other missing minor functionalities that would help us develop our analysis in RDataFrame. I can probably group them all together and get in touch with you to show the context and the usecase?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.