Fill RDataFrame with collections

jruizvid · March 7, 2022, 6:24pm

Hi. I need to transform a python dictionary into a TTree. So far I’ve been using RDataFrames. I can do it succesfully with simple data structures (dict1D) via MakeNumpyDataFrame. But it fails when the data contains collections (dictND). I tried two (unsuccesful) methods based on this and this example.

Below I provide some simplified data with the same structure as my real case.

# Dummy data
import numpy as np

dict1D = {'energy': np.array([11.,22.,33.]),
          'momentum': np.array([10.,20.,30.]),
          'nparticles': np.array([5.,15.,25.])}

dictND = {'position': np.array([list([1.,1.,2.]),list([2.,2.,3.]),list([4.,4.,5.])], dtype=object),
          'otherparticles': np.array([list([22.,33.]),list([44.,66.,77.,88.]),list([11.,99.])], dtype=object)}

# RDataframe with simple branches
import ROOT
rdf = ROOT.RDF.MakeNumpyDataFrame(dict1D)

# - - - - - - - - - - #
# Branches with arrays (FAILS)
#1st try
rdf = rdf.Define("position", '''auto to_eval = "dictND['position'][" + std::to_string(rdfentry_) + "]"; return RVec<float>(TPython::Eval(to_eval.c_str()));''')


#2nd try
@ROOT.Numba.Declare(['int'], 'RVec<float>')
def func(var1):
   return dictND['position'][var1]

rdf = rdf.Define("position", "Numba::func(rdfentry_)")
# - - - - - - - - - - #

rdf.Snapshot('DecayTree', 'mytree.root')

ROOT 6.24

bellenot · March 8, 2022, 9:00am

Maybe @eguiraud or @etejedor can help

eguiraud · March 8, 2022, 9:13am

Hi @jruizvid ,
I am afraid this is not directly supported in RDF at the moment, we should get this feature as a consequence of AwkwardArray import/export, see RDataFrame integration · Issue #588 · scikit-hep/awkward-1.0 · GitHub .

You can use TTree directly and create TTree branches that contain the arrays. There should be several posts on the forum on this topic, maybe @etejedor can point to a good example.

Cheers,
Enrico

etejedor · March 8, 2022, 10:16am

Hello,

You can create branches from NumPy arrays using TTree::Branch from Python:

https://root.cern.ch/doc/master/classTTree.html

If you scroll down to the PyROOT section, you’ll find examples of how to do that, e.g.:

# Array branch - use NumPy array of length N
npa = np.array(N*[ 0. ])
t.Branch('nparrayb', npa, 'nparrayb[' + str(N) + ']/D')

jruizvid · March 8, 2022, 8:05pm

Hello, thank you for the suggestions. I started to work with ttree directly, but I was getting many errors and it felt like I was re-inventing the wheel. I’m looking forward for the implementation of Awkward Arrays with RDF!

So far I found a rather simple solution with uproot:

dictfull = {**dict1D, **dictND}

import awkward as ak
for key in dictfull:
    dictfull[key] = ak.Array(dictfull[key])

import uproot
f = uproot.recreate("example.root")
f["tree"] = dictfull

system · March 22, 2022, 8:06pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.