Adding a numpy array as a new column to an existing RDataFrame

Hi,

this topic is related to this one: Make RDataFrames interoperable with other Python tools

I want to do the following:

  1. Apply some cuts to the RDataFrame
  2. Convert select columns to numpy format and run inference on them
  3. Attach the inference result (1D numpy array) as a new column to RDataFrame

I tried adapting the approach in the linked thread.

def add_to_df(analyzer,prediction,column_name):
    @ROOT.Numba.Declare(["int"], "float")
    def get_prediction(index):
        return prediction[index]
    
    analyzer.Define(column_name, "Numba::get_prediction(rdfentry_)")

The problem is that the rdfentry_, as far as I can tell, corresponds to the original row index, i.e. before applying any cuts to the DataFrame. However, I need to update the indices to run from 0 to N_post_cut applying the cuts. Is there any way around it? Thanks

Cheers,
Matej


ROOT Version: 6.26.11
Platform: linuxx8664gcc
Compiler: g++ (GCC) 11.4.1


Hi Matej,

Interesting question.
You can use an index, returned by a “Define” updated only for your selected entries: could that work? Please correct me if I misunderstood your use case.

Cheers,
Danilo

What you propose could work. How would the Define call to generate new indices look like?

Hi,

For example:

import ROOT
df = ROOT.RDataFrame(16)
df.Filter("rdfentry_%2 == 0")\
  .Define("myEntry", "static unsigned int myEntry = 0; return myEntry++;")\
  .Display().Print()

gives

+-----+---------+
| Row | myEntry | 
+-----+---------+
| 0   | 0       | 
+-----+---------+
| 2   | 1       | 
+-----+---------+
| 4   | 2       | 
+-----+---------+
| 6   | 3       | 
+-----+---------+
| 8   | 4       | 
+-----+---------+

Is this a bit along the lines you had in mind?

Best,
D

1 Like