Using a minimizer to sequentially optimize RDataFrame filters

Hello,

I am trying to sequentially optimize some cuts on my data , using RDataFrame and ROOT::Math::Minimizer.

I want to apply a cut (filter) depending on two parameters to one column of the dataframe, and use the minimizer to find the best cut parameters, determined by a value, which I calculate from the dataframe. This process I want to apply sequentially on multiple columns, while each time the dataframe keeps the previous filters.

My first unsuccessful approach was to write a function

double calculateFOM(double cutpar1, double cutpar2,  ROOT::RDF::RInterface<ROOT::Detail::RDF::RJittedFilter, void>  filteredDF, string columnName)
{
    ... // apply a filter and calculate a figure of merit (double FOM)
    return FOM;
}

and then to fix the dataframe and column name in my macro by defining a lambda expression, which has only the cut parameters as input:

auto calculateFOM1 = [&](const double *x){return calculateFOM(x[0], x[1], MyFilteredDF, MyColumnName);};

However, I didn’t manage to pass a lambda expression to the minimizer. I tried it e.g. by doing ROOT::Math::Functor f1(&calculateFOM1,2); following this example, but this method doesn’t accept lambdas.

Is there a way to minimize a lambda expression?


My second approach is to write a class, that generates a function calculateFOM, which the minimizer hopefully will accept. This class includes a method SetParameters, which allows me to pass the filtered dataframe and column name to the function:

class generateFunction
{
    private:
        string columnName;
        ROOT::RDF::RInterface<ROOT::Detail::RDF::RJittedFilter, void>  filteredDF;

    public:
        void SetParameters(ROOT::RDF::RInterface<ROOT::Detail::RDF::RJittedFilter, void> df1, string name)
            {
                auto filteredDF1= df1;
                auto columnName = name;
            }

        double  calculateFOM(double cutpar1, double cutpar2){

            filteredDF= filteredDF.Filter(std::to_string(cutpar1) + " < " + columnName + " < " + std::to_string(cutpar2));
         
           ... // do some calculation that returns a figure of merit (double FOM) ***

            return FOM;
        }
};

However this doesn’t compile:

... error: call to implicitly-deleted default constructor of 'generateFunction'
... default constructor of 'generateFunction' is implicitly deleted because field 'filteredDF' has no default constructor
                ROOT::RDF::RInterface<ROOT::Detail::RDF::RJittedFilter, void>  filteredDF;

I guess the problem is the declaration of the filteredDF. What is the correct way to do this?
Or should I better use a different approach?

Thanks in advance!
Konrad


ROOT Version: 6.24.02
Platform: linuxx8664gcc
Compiler: Not Provided

Hi @konrad ,
try substituting ROOT::RDF::RInterface<...> with the generic ROOT::RDF::RNode, that might help.
If not, please share a minimal reproducer of the problem that we can play with a little bit.

Cheers,
Enrico

Hi Enrico,

doing this substitution unfortunately didn’t help. I am preparing a minimal example and will share it soon.

Cheers!

optimizeCuts.C (3.7 KB)

Hello again,

attached you can find a minimal example that illustrates and reproduces my problem.

The script creates two root files with random data, from which I create the dataframes.
I try to pass the dataframes to a function with the help of a class, and then just to execute the function to estimate the goodness of my cut (depending on two cut parameters).

However this doesn’t compile with the error message mentioned before.

Cheers,
Konrad

Hi @konrad ,
had to fix a few things but it now compiles and runs (next step is making sure that it runs and does what you want it to do :grinning_face_with_smiling_eyes: ).

  1. making generateFunction default-constructible. For that we need to initialize the ROOT::RDF::RNode variables to something, because an RNode object cannot be default-constructed:
-  ROOT::RDF::RNode data1;
-  ROOT::RDF::RNode data2;
+  ROOT::RDF::RNode data1 = ROOT::RDataFrame(0);
+  ROOT::RDF::RNode data2 = ROOT::RDataFrame(0);
  1. Actually set the class data members in SetParameters, rather than local variables:
   // set parameters
   void SetParameters(ROOT::RDF::RNode df1, ROOT::RDF::RNode df2, string name) {
-    auto data1 = df1;
-    auto data2 = df2;
-    auto columnName = name;
+    data1 = df1;
+    data2 = df2;
+    columnName = name;
   }
  1. Fix a bug in the filter strings: in C++, l < c < u is not the same as l < c && c < u (see e.g. here):
   // dataframes, to be minimized
   double calculateFOM(double l, double u) {

-    data1 = data1.Filter(std::to_string(l) + " < " + columnName + " < " +
+    data1 = data1.Filter(std::to_string(l) + " < " + columnName + " && " + columnName + " < " +
                          std::to_string(u));
-    data2 = data2.Filter(std::to_string(l) + " < " + columnName + " < " +
+    data2 = data2.Filter(std::to_string(l) + " < " + columnName + " && " + columnName + " < " +
                          std::to_string(u));

     auto temp1 = data2.Count();
  1. As an optimization, we can use C++ lambdas instead of strings for the Filter expressions:

-    data1 = data1.Filter(std::to_string(l) + " < " + columnName + " && " + columnName + " < " +
-                         std::to_string(u));
-    data2 = data2.Filter(std::to_string(l) + " < " + columnName + " && " + columnName + " < " +
-                         std::to_string(u));
+    data1 = data1.Filter([l,u](double c) { return l < c && c < u; }, {columnName});
+    data2 = data2.Filter([l,u](double c) { return l < c && c < u; }, {columnName});

This is the end result: optimizeCuts.C (3.9 KB)

That still doesn’t use a ROOT minimizer. If you encounter any problem with that step just provide an updated version of this reproducer and we can take a look.

Cheers,
Enrico

1 Like

Thank you very much for your help!

With your changes the function works without problem, and I was able to give it to a minimizer. To conclude, I updated the minimal example, in case someone else stumbles on the same problem:

optimizeCuts.C (4.4 KB)

I used ROOT::Math::Minimizer with Minuit2, and I obtain fast and good results with the Simplex algorithm (in my real data, in the example code it doesn’t really makes sense, but it minimizes something :smiley: ).

1 Like

Amazing :slight_smile:

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.