RDataFrame: Defining new column evaluated as a function of external values in Python

Dear experts,

I am trying to find a way to calculate event weights as a function of some variables defined in the Python scope as well as values already existing in some branches (columns) of the RDF.

The event loop equivalent would look somewhat like this:

x = loadSF("0-100")
y = loadSF("100-200")
z = loadSF("200-Inf")

for entry in tree:
    if entry.A < 100: weight = x
    elif entry.A > 100 and entry.A < 200: weight = y
    else: weight = z

where loadSF is some Python function involving reading external files.

I have some ideas about how to proceed from seeing the C++ example of lambda captuing:

RDataFrame d(100); // a RDF that will generate 100 entries (currently empty)
int x = -1;
auto d_with_columns = d.Define("x", [&x] { return ++x; })
                       .Define("xx", [&x] { return x*x; });

but I am unable to reproduce this code in Python.

Is it possible to appropriate lambda capturing in Python, or even better, achieve the reweighting goal with external values in a straightforward way?

Best regards,

ROOT Version: 6.16
Platform: SL7
Compiler: Not Provided

Hi Spandan,
at present, RDF cannot execute python lambdas during the event loop.
Among the obstacles to overcome to make it possible there is the python Global Interpreter Lock (GIL),
which effectively would make RDF multi-threading useless.

A workaround for your specific scenario would be to declare x, y and z to the interpreter and then use them as you would use C++ variables:

x = loadSF("0-100")
y = loadSF("100-200")
   const double x = double(TPython::Exec("x"));
   const double y = double(TPython::Exec("y"));

# Now that gInterpreter knows the C++ variable `x`, you can use it in your `Define` expressions
df = df.Define("weight", "if (A < 0) return x else return y;");

Hope this helps,

1 Like

Hi Enrico,

This workaround perfectly solves my problem. Thank you very much!

I was also able to reproduce the example code in Python using the same approach:

from ROOT import *
d = ROOT.RDataFrame(100)
x = -1
gInterpreter.Declare('int x = int(TPython::Exec("x"));')
d_with_columns = d.Define("x", "++x;").Define("xx", "x*x;")

Best regards,

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.