Reading vector branch from .root file and converting it to numpy array on PyRoot

K.H.Kim · March 25, 2021, 12:25am

Dear all,

I’m trying to read .root file (array-1-0.root (105.3 KB) ) which contain 1 vector<double> and 2 double variable branches, by PyROOT.

My python script is :

import sys 
import os
import ROOT as rt
from ROOT import gROOT as grt 
from ROOT.VecOps import RVec
from sklearn.decomposition import PCA 
import pandas
import numpy as np

#variables = ["vector0", "vector1"]
   
if __name__ == "__main__":
    for filename, label in [["array-1-0.root", "signal"]]:
        print(">>> Extract the training and testing events for {} from the {} dataset.".format(label, filename))
        fullpath = os.path.dirname(filename)+filename
        df = rt.RDataFrame("test", fullpath) # getting tree
        vec1 = df.AsNumpy(columns=["vector"])   # test to get "vector", vector<double> branch
        vec4 = rt.VecOps.RVec('double')(vec1) # does not work

# Output test
        print("Test. {}".format(vec4))

And I’d like to read my vector<double> object at branch (in the file) and convert it to numpy array.

At first I simply tried like vec1=file.AsNumpy(columns=["vector"]).
In my script, the output of print(“{}”.format(vec1)) is :

{‘vector’: ndarray([<cppyy.gbl.ROOT.VecOps.RVec object at 0x7fca7e5d0540>,
<cppyy.gbl.ROOT.VecOps.RVec object at 0x7fca7e5d0568>,
<cppyy.gbl.ROOT.VecOps.RVec object at 0x7fca7e5d0590>,
<cppyy.gbl.ROOT.VecOps.RVec object at 0x7fca7e5d05b8>,
…

Then how can I take a vector and convert it to numpy array?

Thank you.

Best regards,
KH Kim

ROOT Version: 6.22/09
Platform: WSL Ubuntu 18.04
Compiler: (Not exactly know what, but I run python script.)

K.H.Kim · March 25, 2021, 3:34am

After uploading my question, I tried to change my script

df = rt.TFile( fullpath )
test = df.Get('test')
vector_a = []

for event in test:
    vector_b = np.array(test.vector)
    vector_a.append(vector_b)

Then I can get numpy array of vector branch.

Also, please let me know if anyone has a comment for my original question (using RDataFrame) or my own answer (not using RDataFrame) .

Best regards,
KH Kim

eguiraud · March 25, 2021, 10:19am

Hi @K.H.Kim ,
do you really need all elements of the vector for all events concatenated in a single one-dimensional numpy array? May I ask what your use case is (i.e. what you need to do with that numpy array)?

Given the vec1 dictionary as per your first snippet of code, you can get what you want this way:

numpy_v = np.concatenate([np.asarray(e) for e in vec1])

The np.asarray(e) converts the RVec with the vector elements of each event into a numpy array, with zero copy, i.e. quickly.
The np.concatenate takes all the numpy arrays with the vector elements of each event and concatenate them into one long numpy array. This will need to alllocate memory and depending on the dataset size it might take a few seconds.

Cheers,
Enrico

system · April 8, 2021, 10:19am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.