Passing RDataFrame values into Python function?

I am very new to using ROOT, so apologies if this question is unclear. I am trying to use the values stored in my RDataFrame as arguments for a Python function. Specifically, I want to define a Python function to solve a system of nonlinear equations, pass the values from each row of the RDataFrame as coefficients in the functions, and define a new column in the DF representing the solution (most likely using SciPy to find the solution). Most examples with user-defined functions seem to involve C++ functions, but is it possible to write the function in Python? Also open to other suggestions about how to approach this problem.

Hello @somebody_nobody,
welcome to the ROOT forum!

From what I understand, you need something like this:

import ROOT

# Create a data frame with 100 rows
rdf = ROOT.RDataFrame(100)
 
# Define a new column `x` that contains random numbers
rdf_x = rdf.Define("x", "gRandom->Rndm()")

@ROOT.Numba.Declare(['RVec<float>', 'float'], 'float') 
def pypowsum(x, y):
    s = 0
    for e in x:
        s += e**y 
    return s

pypowsum(rdf_x.Take[ROOT.Double_t]("x"), 3 )

Let me know if it helps.

Cheers,
Monica

Thank you for your reply. I am still looking for a solution that can define a new RDataFrame column with the result. In other words, I don’t want one result from the function, but a result corresponding to each event in the RDataFrame. My current attempt is as follows:

@ROOT.Numba.Declare([‘RVec’, ‘RVec’], ‘RVec’)
def add(x, y):
return x + y
df = df.Define(“sum”, add(“A”, “B”))

Unfortunately, I keep getting errors about the column names being “undeclared identifiers”. I have tried many modifications of this approach, but I’m not sure how to resolve it. Thanks in advance for any suggestions.

In this case, assuming that df is an RDF with columns A and B, what you would do is simply

df_sum = df.Define("sum", "A+B")

Is what you want to do to apply a scipy function to an RVec and then push back the result into an RDF?

Yes, exactly. I was only using the sum function as a test to see whether I could extract the values from the RDF. I want to use SciPy (or Sympy, or some other numerical solver) to operate on an RVec, and then export the result to a column of the RDF.

Currently the issue is that I want to use the Python Math package within the function, but this is disallowed, causing the error:
numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Untyped global name ‘Math’: Cannot determine Numba type of <class ‘Math_meta’>

Then the fastest solution could be to convert your RDF to awkward like this:

import awkward as awk

awk_x = awk.from_rdataframe(rdf_x, columns = ("x")) #convert to awkward
x = awk_x.x #extract column
z = np.power(x,2) #use numpy or scipy or whatever python module
awk_x["z"] = z #push back result
df_xz = awk.to_rdataframe({"x": awk_x.x,  "z":awk_x.z}) #convert back to RDF