Hi,
I have a tree which contains events which consist of arrays with multiple values. After some filtering with the RDataFrame functionalies I’d like to make plots with matplotlib. I run into problems when I want to convert my data into numpy arrays.
So basically what I am trying to do:
filtered_data = df.Filter(....).AsNumpy(columns = ["mycolumn"])
print(filtered_data)
gives me
{'mycolumn': ndarray([<cppyy.gbl.ROOT.VecOps.RVec<float> object at 0x145885600>,
<cppyy.gbl.ROOT.VecOps.RVec<float> object at 0x145885628>,
<cppyy.gbl.ROOT.VecOps.RVec<float> object at 0x145885650>,
...............)
What I want:
filtered_data = [ [1,2,3,4,5.... ], [11,12,13,14... ], [100, 101, 102, 103... ], .....]
I have not a found a way to accomplish that yet. Would appreciate some help, thank you!
Yes I found that thread. Somehow this does not work for me.
I get
Value Error: zero-dimensional arrays cannot be concatenated
Hi @Tim_Buktu ,
you data contains, for every event, collections of possibly different sizes. That cannot be described by numpy arrays, which must be rectangular, so we return a numpy array of RVecs instead.
You can loop over that numpy array of RVecs and convert it to a list of numpy arrays, it should be just:
[numpy.array(v).tolist() for v in filtered_data]
Cheers,
Enrico
P.S.
the concatenation will not give the list of lists that you ask for, but rather a single flattened list of all the elements. I can’t tell where the error comes from, but you can run your code through the python debugger and see where the zero-dimensional numpy array comes from.
Hi thanks for the reply. Your suggestion gives me an array with the columname as its entry:
["mycolumn"]
Yes sorry, the code was just to give you an idea. As filtered_data is a dictionary, for v in filtered_data loops over the keys of the dictionary. The correct code should be (it might need minor adjustments, it’s just to give you the idea):
[numpy.array(v).tolist() for v in filtered_data["mycolumn"]]
Cheers,
Enrico
Thank you very much it works now. I should have remembered how dicts work!
Maybe since we are at it:
How would I plot it using only root?
So I do my filtering and then I want to plot one of the filtered events using root. And let us assume all events consists of an array with 1000 elements. It should work with TGraph somehow I guess.
Or take a look at the Dataframes documentation