Adding a new branch from a python ML model

Hi,

I would like to define a new branch/column in an RDataFrame similar to something here: Add new column to RDataFrame

The decorator @ROOT.DeclareCppCallable doesn’t seem to exist although I’m not using experimental pyROOT so I could be wrong. However in the most recent version of ROOT support has been added for Numba callables. Is there a way to do this?

Some skeleton code to show what I would like to do



model = Some_sklearn_model()

@ROOT.DeclareCppCallable(["float"] * 2, "float")
def predictModel(var1, var2):
    return model.predict([var1, var2])

df = ROOT.ROOT.RDataFrame(10).Define("x", "CppCallable::predictModel(var1, var2)")

Thanks!


Please read tips for efficient and successful posting and posting code

_ROOT Version: 6.22.06


Hi,
the feature changed a bit from when that comment was posted, see this NumbaDeclare tutorial.

Hope this helps!
Enrico

Hi @eguiraud,

Thanks for pointing me to this. I tried something like this but it doesn’t seem to work. In my case, the classifier I am using requires a pandas dataframe as input rather than just an array. My implementation is as follows:


my_model = classifiers['KnnFlatness']
@ROOT.Numba.Declare(['float'] * 4, 'float')
def decision_value(d0_pt,pis_pt,chi2,doca):
    frame = pd.DataFrame({'Dst_ReFit_D0_PT':d0_pt,'Pi_slow_ReFit_PT':pis_pt,'Dst_ReFit_chi2_best':chi2,'D0_Loki_AMAXDOCA':doca})
    return my_model.decision_function(frame)

but this fails with this error

TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Untyped global name 'my_model': Cannot determine Numba type of <class 'hep_ml.gradientboosting.UGradientBoostingClassifier'>

File "<ipython-input-74-78e12f83008f>", line 4:
def decision_value(d0_pt,pis_pt,chi2,doca):
    <source elided>
    frame = pd.DataFrame({'Dst_ReFit_D0_PT':d0_pt,'Pi_slow_ReFit_PT':pis_pt,'Dst_ReFit_chi2_best':chi2,'D0_Loki_AMAXDOCA':doca})
    return my_model.decision_function(frame)

It looks like only supported Numba types are allowed. Is there a way to call functions on non-numba python objects? The example in the thread I linked to wraps things inside a class. Is this necessary?

Thanks

You can try with just the @numba.jit decorator: if it can jit your code in nopython mode, it should work with ROOT.Numba.Declare too. However I don’t think numba knows (or can know) how to create low-level code that corresponds to that my_model.decision_function call :confused:

One workflow that’s available is applying the model in Python and then save the numpy array with the classification results in a TTree using ROOT.RDF.MakeNumpyDataFrame and Snapshot.

That gives you a separate TTree that you can use together with the original TTree, as its “friend”, as if they were a single TTree. @swunsch might have further comments.

1 Like

A workaround suggested in the thread I linked to uses Define with TPython::Eval(values[rdfentry_]) (with some string formatting). This works in principle but because I filter some events in another frame, sometimesrdfentry_ is outside of the array length. Is there a way to “reindex” an RDataFrame, i.e. for a new dataframe make a new column going from 0 to the length of index? I can’t think of a C++ way to do this from existing branches.

Actually another workaround, I used a python dictionary with the key as rdfentry_. This works but isn’t very performant so any other suggestions would be helpful.

Thanks

In general, calling Python code (via TPython or otherwise) from the C++ event loop is not going to have good performance.

If I understand the question correctly, Filter+Cache or Filter+Snapshot is what you might be looking for.

What about my friend tree suggestion above?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.