Add new column to RDataFrame

Hi!

Probably it does not help you, but let me show you what we already have for experimental PyROOT in 6.18 (and planned to be “standard” in 6.20):

import ROOT

class AwesomeModel:
    def predict(self, x):
        return x[0] * x[1]

model = AwesomeModel()

@ROOT.DeclareCppCallable(["float"] * 2, "float")
def predictModel(var1, var2):
    return model.predict([var1, var2])

df = ROOT.ROOT.RDataFrame(10).Define("x", "CppCallable::predictModel(var1, var2)")
print(df.AsNumpy())

From the technical side: The problem is that this will be always not thread safe and also interferes with the global interpreter lock of Python in the multi-threaded case.

From the ML/algorithmic side: As soon as you use neural network, a batch inference will always be much much faster than event-by-event unless you filter your dataset massively beforehand. So the friend tree solution would be most suitable.

In case you’re just executing something “simple” in python, you can use the solution above at the speed of C++ since we use numba if possible to jit the thingy into compiled code :slight_smile:

Best
Stefan

Edit: Even though the code above is not suited for multi-threading, we have protected the calls with a lock!

2 Likes