Flattening Complicated Data Model with RDataFrame

weissercn · November 2, 2018, 12:56am

I have ROOT files that have the following structure for each entry (=event):

vector<Double_t > primary_vertex_x e.g (1, 2)
vector<Double_t > primary_vertex_y e.g (-1.5, 1)
…
vector<Double_t > primary_vertex_id_for_track e.g ( 0, 1, 0)
vector<Double_t > track_px e.g ( 10, 13,12 )
vector<Double_t > track_py e.g (21, -22, 20)

I have, say, 2 primary vertices and 3 tracks in an event and primary_vertex_id_for_track tells me for each track which primary vertex to link to.

I would like to read this tree and flatten it, such that this event gives me 3 entries.

e.g
primary_vertex_x, primary_vertex_y, track_px, track_py
1, -1.5, 10, 21
2, 1, 13, -22
1, -1.5, 12, 20

I can do this using pyroot by looping over each event. Is there a way to do with RDataFrame in python?

eguiraud · November 2, 2018, 8:32am

Hi,
see the discussion at One to many transformation in RDataFrame, we had a ticket with a feature request that would solve this usecase but it was never implemented. As things are, I don’t think there is a way to do this in python (with a reasonable syntax at least).

beojan · November 2, 2018, 9:17am

I would suggest using RDataFrame in C++ to do the part where you index into the primary vertex vector, and write out a new tree, where all the columns are vectors with the same number of entries.

Then you can use this new tree in Python. Uproot can import it and flatten things the way you want (with uproot 3.2.9: https://github.com/scikit-hep/uproot/issues/179).

system · November 16, 2018, 9:17am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.