I would like to define a new branch/column in an RDataFrame similar to something here: Add new column to RDataFrame
The decorator @ROOT.DeclareCppCallable doesn’t seem to exist although I’m not using experimental pyROOT so I could be wrong. However in the most recent version of ROOT support has been added for Numba callables. Is there a way to do this?
Some skeleton code to show what I would like to do
Thanks for pointing me to this. I tried something like this but it doesn’t seem to work. In my case, the classifier I am using requires a pandas dataframe as input rather than just an array. My implementation is as follows:
TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Untyped global name 'my_model': Cannot determine Numba type of <class 'hep_ml.gradientboosting.UGradientBoostingClassifier'>
File "<ipython-input-74-78e12f83008f>", line 4:
def decision_value(d0_pt,pis_pt,chi2,doca):
<source elided>
frame = pd.DataFrame({'Dst_ReFit_D0_PT':d0_pt,'Pi_slow_ReFit_PT':pis_pt,'Dst_ReFit_chi2_best':chi2,'D0_Loki_AMAXDOCA':doca})
return my_model.decision_function(frame)
It looks like only supported Numba types are allowed. Is there a way to call functions on non-numba python objects? The example in the thread I linked to wraps things inside a class. Is this necessary?
You can try with just the @numba.jit decorator: if it can jit your code in nopython mode, it should work with ROOT.Numba.Declare too. However I don’t think numba knows (or can know) how to create low-level code that corresponds to that my_model.decision_function call
One workflow that’s available is applying the model in Python and then save the numpy array with the classification results in a TTree using ROOT.RDF.MakeNumpyDataFrame and Snapshot.
That gives you a separate TTree that you can use together with the original TTree, as its “friend”, as if they were a single TTree. @swunsch might have further comments.
A workaround suggested in the thread I linked to uses Define with TPython::Eval(values[rdfentry_]) (with some string formatting). This works in principle but because I filter some events in another frame, sometimesrdfentry_ is outside of the array length. Is there a way to “reindex” an RDataFrame, i.e. for a new dataframe make a new column going from 0 to the length of index? I can’t think of a C++ way to do this from existing branches.
Actually another workaround, I used a python dictionary with the key as rdfentry_. This works but isn’t very performant so any other suggestions would be helpful.