Modifying an existing column in RDataFrame


ROOT Version: 6.16.00
Platform: lxplus


Dear ROOT experts,

I’ve ran into a problem when I was converting my analysis code to RDataFrame. In my code I have a branch named “weight” both in the input and the output tree with the latter being a modified version of the former:

outputTree = TTree(treeName, '')
weight = array('d',[0])
outputTree.Branch("weight", weight, "weight/D")

for event in inputTree:
	weight[0] = event.weight * datasetWeight
	outputTree.Fill()

Is there a way to achieve the same result with RDataFrame? I thought of making an intermediate column for the modified weights, deleting the original “weight” column and than copying the intermediate values to new “weight” column, but I didn’t find a way to delete an existing column.

Thanks in advance.

Hi,

I think rdf implements a zero copy policy. modifying in place your weights read through an rvec or as a string (which uses internally rvecs) could well do the job even if I dislike the solution and am sure rdf will allow something better soon :slight_smile:

p

Hi,
this is unfortunately not possible, the relevant ticket is ROOT-10165.

I’ve found a workaround for how to change the name of the branch: make a column “weightModified”, snapshot it, open the file and then use TTree::SetAlias

tree.SetAlias('weight', 'weightModified')

But now the problem is to use the Python variable to multiply the weight in the ‘Define’ statement. The following code

datasetWeight = 100
rdf.Define('weightModifierd','weight * datasetWeight')

fails to ran and outputs the following error:

input_line_82:2:15: error: use of undeclared identifier 'datasetWeight'
return weight*datasetWeight
              ^
Traceback (most recent call last):
  File "ConvertTreeRDFElab.py", line 105, in <module>
    outputTree = ConvertTree(inputTree, outputFileName, outputTreeName)
  File "ConvertTreeRDFElab.py", line 60, in ConvertTree
    .Define('weightModified', 'weight*datasetWeight')
Exception: ROOT::RDF::RInterface<ROOT::Detail::RDF::RJittedFilter,void> ROOT::RDF::RInterface<ROOT::Detail::RDF::RJittedFilter,void>::Define(experimental::basic_string_view<char,char_traits<char> > name, experimental::basic_string_view<char,char_traits<char> > expression) =>
    Cannot interpret the following expression:
weight*datasetWeight

I’ve found those two threads but it’s still not clear how to adapt their solutions to my problem.
https://root-forum.cern.ch/t/add-new-column-to-rdataframe/
https://root-forum.cern.ch/t/rdataframe-define-column-of-same-constant-value/

How can I use already defined Python variable to multiply the values in the existing RDF columns?

Thanks in advance.

Hi,
work is in progress towards better integration of RDF with python. Until then, Define expressions must be valid C++.

In your case, that could be either

datasetWeight = 100
rdf.Define('weightModified','weight * {}'.format(datasetWeight))

or

datasetWeight = 100
rdf.Define('weightModified', 'weight * int(TPython::Eval("datasetWeight"))')

Hope this helps!
Enrico

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.