Return RDataFrame at end of function

Hello,

I am working with RDataFrames and pyroot to analyze my data.
I want to write a function to add some columns and perform some calculations. At the end I want to return my processed DataFrame so I can use it to do some filtering for example in another function. How would I do that? I am thinking of something like this:

def DataFrameManipulation(rootfile): 

      tree = somefunctionwichgetsthetree(rootfile)
        
      df = rt.RDataFrame(tree)
        
      df = df.Define("new column", some operation....)
      
      return df

def Filtering(df):

      df = DataFrameManipulation(rootfile)
      counts = df.Filter(some filtering which includes the new column defined in the above function)...

I know that the example is not working an I am sure this is pretty naiv. But you might get the idea about what I am trying to achieve. Would appreciate any help. Thanks!

Hi @Tim_Buktu ,
nice username!

I think the only problem there is that tree goes out of scope at the end of DataFrameManipulation. You can do instead:

def DataFrameManipulation(treename, filenames): 
      df = rt.RDataFrame(treename, filenames)
      df = df.Define("new column", some operation....)
      return df

or equivalent, so RDF will take care of opening and closing files and creating and destructing the trees. If for some reason you need to remain in charge of extracting trees from files, then you can keep the tree alive by returning it from the function together with the RDF that needs it:

def DataFrameManipulation(rootfile): 
      tree = somefunctionwichgetsthetree(rootfile)
      df = rt.RDataFrame(tree)
      df = df.Define("new column", some operation....)
      return tree, df

I hope this helps!
Enrico

Thank you very much! That did the trick. Simpler than I thought! :slight_smile:

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.