VecOps.RVec to numpy array

Hi,

I have a tree which contains events which consist of arrays with multiple values. After some filtering with the RDataFrame functionalies I’d like to make plots with matplotlib. I run into problems when I want to convert my data into numpy arrays.

So basically what I am trying to do:

filtered_data = df.Filter(....).AsNumpy(columns = ["mycolumn"])
print(filtered_data)

gives me

{'mycolumn': ndarray([<cppyy.gbl.ROOT.VecOps.RVec<float> object at 0x145885600>,
         <cppyy.gbl.ROOT.VecOps.RVec<float> object at 0x145885628>,
         <cppyy.gbl.ROOT.VecOps.RVec<float> object at 0x145885650>,
         ...............)

What I want:

filtered_data = [ [1,2,3,4,5.... ], [11,12,13,14... ], [100, 101, 102, 103... ], .....]

I have not a found a way to accomplish that yet. Would appreciate some help, thank you!

Did you try to search the forum?
Maybe this one can help: Reading vector branch from .root file and converting it to numpy array on PyRoot
Otherwise I’m sure @eguiraud can give more details

Yes I found that thread. Somehow this does not work for me.

I get

Value Error: zero-dimensional arrays cannot be concatenated

Hi @Tim_Buktu ,
you data contains, for every event, collections of possibly different sizes. That cannot be described by numpy arrays, which must be rectangular, so we return a numpy array of RVecs instead.

You can loop over that numpy array of RVecs and convert it to a list of numpy arrays, it should be just:

[numpy.array(v).tolist() for v in filtered_data]

Cheers,
Enrico

P.S.
the concatenation will not give the list of lists that you ask for, but rather a single flattened list of all the elements. I can’t tell where the error comes from, but you can run your code through the python debugger and see where the zero-dimensional numpy array comes from.

1 Like

Hi thanks for the reply. Your suggestion gives me an array with the columname as its entry:

["mycolumn"]

Yes sorry, the code was just to give you an idea. As filtered_data is a dictionary, for v in filtered_data loops over the keys of the dictionary. The correct code should be (it might need minor adjustments, it’s just to give you the idea):

[numpy.array(v).tolist() for v in filtered_data["mycolumn"]]

Cheers,
Enrico

Thank you very much it works now. I should have remembered how dicts work!

Maybe since we are at it:
How would I plot it using only root?

So I do my filtering and then I want to plot one of the filtered events using root. And let us assume all events consists of an array with 1000 elements. It should work with TGraph somehow I guess.

Maybe with TTree::Draw()?

Or take a look at the Dataframes documentation

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.