Hi, thanks for looking at this again. I’m a little confused about this and maybe I’m vastly over complicating things.
The data used here has a particular Event Data Model (EDM) that includes x
photons, y
leptons and z
jets. I’m not trying to read in an EDM from a dataset here, I’m trying to write/extend one. However I think there are two facets/approaches that I’m getting mixed up:
Data set to root trees.
If my simulation spits out 3 events each with an irregular length of objects:
event1: first(a=1,b=1,c=1), second(a=2,b=1,c=2)
event2: first(a=3,b=1,c=1)
event3: first(a=1,b=1,c=2), second(a=2,b=1,c=1), third(a=1,b=2,c=1)
how do I write that to a dataframe such that I I can read it like:
df =ROOT.RDataFrame("tree","my_file.root")
h = df.Filter('c==1').Histo1D('a.at(0)')
that gives a hist of 1,3,2.
furthermore how can I sort these events such that a.at(0)
gives the object with the highest b
value? as in event3: RVec a({1,2,1}), b({1,1,2}), c({2,1,1})
if I sort RVec b
then RVec a
remains unchanged no?
the struct from before allows us to associate the branches in an event to their respective elements but does not seem practical in terms of reading and writing the files.
Extending existing datasets
for example: in the tutorial here they use the index of the event to define a mask for goodphotons
whereas I’d simply like to make a branch is_good_photon
with the same length as photon_pt
etc that I can use to create some more complex filters rather than combine masks.
The reason for this is sorting, eg if I want the invariant mass of each good_photon
with the lepton that is closest to it in deltaR
and then filter my events that the highest of these masses is close to a certain value. I have to create a vector of leptons with their deltaR
to the first good_photon
. Sort this list. and set the lepton-photon invariant mass. I’d then sort my good_photons
by this invariant mass and reverse it to get the highest one. So Id have photons
, good_photons
, these classes could be fed a list of lepton
objects to return the one with the smallest deltaR
to give lepton_merged_good_photons
and then sorted to give sorted_lepton_merged_good_photons
This is a very pythonic way of thinking and I’m not sure it translates and possibly explains my question on efficiency.
With masks it would be define good photons, define deltaR, define closest
, and then define Max(InvariantMass(photons[goodphotons] & leptons[closestdeltaR]))
which is fine it’s just very difficult to parse mentally and to debug.