RDataFrame to access Mean/Min/Max of a new "Define" variable


I am playing a bit with RDataFrames and having a problem. I have some vector<float*> branches in a TTree, and I wish to extract the Mean of indexed elements. My code so far is very simplistic (and using pyROOT). I expected that I would need to define a new column to flatten the variable, eg
rdf.Define("j0","jet_pt[0]"). This is allowed but when I attempt to calculate the Mean, eg rdf.Mean("j0").GetValue(), I get the error TypeError: can not resolve method template call for 'Mean'. Do I need to somehow cast the variable in python when I create a new column, or do I need to call something to “activate” the new column? I can call rdf.Mean("jet_pt") which returns a value running over all the elements in the vector.

I am not sure how to continue and could not find a tutorial showing this.


ROOT Version: 6.14.04
Platform: SLC6
Compiler: gcc62

welcome to the ROOT forum!

When you say vector<float*> you mean vector<float> right? Assuming yes in the following.


should really be

rdf2 = rdf.Define("j0","jet_pt[0]")

i.e. j0 is only defined in dataframes “downstream” of the Define call. You can also just chain calls:

m = rdf.Define("j0","jet_pt[0]").Mean("j0").GetValue()

Note that I have not tested the code, but it should give you the idea.

Unfortunately PyROOT sometimes hides proper error messages that would be part of a C++ exception. The situation might be better with newer ROOT versions (in fact, please do not use RDataFrame with v6.14 if you can avoid it – the amount of fixes and improvements since then has been enormous).


This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.