[RDataFrame] Inter-conversion between ROOT and numpy data for RVec type columns

Currently, the ROOT.RDF.MakeNumpyDataFrame method only seems to accept columns with simple data types. However, in practice, a lot of analysis data contains columns of vector types. If someone wants to first convert ROOT data to numpy arrays via the rdf.AsNumpy method to do some processing and convert back to ROOT, it will give an error if there exists vector type columns:

import ROOT
ROOT.RDataFrame(10).Define("foo", "ROOT::VecOps::RVec<int>{1,2,3,4}").Snapshot("tree", "test.root")
array_data = ROOT.RDataFrame("tree", "test.root").AsNumpy()
ROOT.RDF.MakeNumpyDataFrame(array_data)

which gives the error

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In [3], line 4
      2 ROOT.RDataFrame(10).Define("foo", "ROOT::VecOps::RVec<int>{1,2,3,4}").Snapshot("tree", "test.root")
      3 array_data = ROOT.RDataFrame("tree", "test.root").AsNumpy()
----> 4 ROOT.RDF.MakeNumpyDataFrame(array_data)

RuntimeError: Object not convertible: Dictionary entry foo is not convertible with AsRVec.

Is there a solution or work-around to this problem? Ideally, one would expect the inter-conversion to work at least for standard C++ types.

ROOT Version: 6.26, 6.27, 6.28
Platform: LCG

1 Like

Hi @AlkaidCheng ,

indeed MakeNumpyDataFrame only supports simple types. There is some work towards supporting input awkward arrays and exporting RDF data as awkward arrays, which would solve the case in your example. See ACAT 2022 (23-28 October 2022): Awkward Arrays to RDataFrame and back · Indico .

Cheers,
Enrico

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.