Creating and Sorting Vectors of Collections

Interestingly trying to parse your example into pyRoot also fails:

muon_pt = ROOT.RVec('double')((1.,2.,3.))
muon_eta = ROOT.RVec('double')((1.,2.,3.))
muon_phi = ROOT.RVec('double')((1.,2.,3.))
muon_m = ROOT.RVec('double')((1.,2.,3.))
my_leptons = ROOT.VecOps.Construct(ROOT.lepton)((muon_pt, muon_eta, muon_phi, muon_m))

with a:

TypeError: Template method resolution failed:
  Failed to instantiate "Construct(lepton)"

Ok this helps: as per the error message, RVec<lepton> needs a default constructor lepton().

The “Template method resolution failed” might or might not be a consequence of the problem above.

Beautiful! Very almost there now.

so I now have the following and it works

df = ROOT.RDataFrame(treename,filename)
ROOT.gInterpreter.Declare('''
struct lepton {
  float pt;
  float eta;
  float phi;
  float m;

  lepton(float init_pt=0.0, float init_eta=0.0, float init_phi=0.0, float init_m=0.0)
      : pt(init_pt), eta(init_eta), phi(init_phi), m(init_m) {}
};
''')
df2 = df.Define('leptons','ROOT::VecOps::Construct<lepton>(muon_pt, muon_eta, muon_phi, muon_m)')
c = ROOT.TCanvas()
hpt = df2.Filter("nmuon>0").Define("pt", "leptons.at(0).pt").Histo1D("pt")
hpt.Draw()
c.Draw()

ok. nice.

So last question(s):

  1. I know I can sort the collections using VecOps::Sort but how do I sort on elements? eg:
df3 = df2.Define('sorted_leptons','ROOT::VecOps::Reverse(ROOT::VecOps::Sort(leptons.pt))')

or

df3 = df2.Define('sorted_leptons','ROOT::VecOps::Reverse(ROOT::VecOps::Sort(leptons, [](lepton l){return l.pt}))')
  1. Above I use the .at(0) operator (as per your earlier advice) and have extended this now with a .at(0,lepton()) so the default constructor is useful. But in this case, I do a filter saying I need 2 leptons, then say that I also need these leptons to have a eta<2.4, but am now not sure that there are still two leptons? Can I define a filter that keeps checking? eg ROOT::VecOps::Take

  2. finally, is there any memory or performance issues associated with defining like 4 copies of filtered collections like this (leptons, filtered_leptons, sorted_leptons, leading_leptons)? Or is it better to write a single function that returns the 2 highest pt leptons that pass all selections?

  1. you can just try it out in a python or ROOT interpreter (also outside of a Define, simply playing with RVec<lepton> objects). I think the second alternative is what you want.

  2. sounds like you want to filter on eta before you filter on the number of leptons?

  3. a single function is probably going to be more performant, but make sure to not optimize prematurely

Okey dokey, I think I’ve almost got what I need but I ran into a few snags.

I made a notebook to demonstrate what I’m trying to do but my swan-fu has failed me today so here’s a pdf instead: Photon_PY.pdf (47.7 KB)

So what’s there. Ok making the structure is really nice and easy to do with the patent pending Guiraud method, but when it comes to snapshotting and using the files, our lovely objects are missing. I re-tried it using one of the public datasets to demonstrate and to check that it isn’t an issue with my data.

As for my other questions, I think I should be able to fix these myself (and probably should do that myself) but it will likely be inelegant solutions, which is my problem not yours, but will add them to my example as and when I manage.

Hi,
as per the error message, you need to generate dictionaries for your custom objects otherwise ROOT does not know how to write them, see e.g. here.

Cheers,
Enrico

Hi, thanks for looking at this again. I’m a little confused about this and maybe I’m vastly over complicating things.

The data used here has a particular Event Data Model (EDM) that includes x photons, y leptons and z jets. I’m not trying to read in an EDM from a dataset here, I’m trying to write/extend one. However I think there are two facets/approaches that I’m getting mixed up:

Data set to root trees.

If my simulation spits out 3 events each with an irregular length of objects:

event1: first(a=1,b=1,c=1), second(a=2,b=1,c=2)
event2: first(a=3,b=1,c=1)
event3: first(a=1,b=1,c=2), second(a=2,b=1,c=1), third(a=1,b=2,c=1)

how do I write that to a dataframe such that I I can read it like:

df =ROOT.RDataFrame("tree","my_file.root")
h = df.Filter('c==1').Histo1D('a.at(0)')

that gives a hist of 1,3,2.

furthermore how can I sort these events such that a.at(0) gives the object with the highest b value? as in event3: RVec a({1,2,1}), b({1,1,2}), c({2,1,1}) if I sort RVec b then RVec a remains unchanged no?

the struct from before allows us to associate the branches in an event to their respective elements but does not seem practical in terms of reading and writing the files.

Extending existing datasets

for example: in the tutorial here they use the index of the event to define a mask for goodphotons whereas I’d simply like to make a branch is_good_photon with the same length as photon_pt etc that I can use to create some more complex filters rather than combine masks.

The reason for this is sorting, eg if I want the invariant mass of each good_photon with the lepton that is closest to it in deltaR and then filter my events that the highest of these masses is close to a certain value. I have to create a vector of leptons with their deltaR to the first good_photon. Sort this list. and set the lepton-photon invariant mass. I’d then sort my good_photons by this invariant mass and reverse it to get the highest one. So Id have photons, good_photons, these classes could be fed a list of lepton objects to return the one with the smallest deltaR to give lepton_merged_good_photons and then sorted to give sorted_lepton_merged_good_photons This is a very pythonic way of thinking and I’m not sure it translates and possibly explains my question on efficiency.

With masks it would be define good photons, define deltaR, define closest, and then define Max(InvariantMass(photons[goodphotons] & leptons[closestdeltaR])) which is fine it’s just very difficult to parse mentally and to debug.

Hi Vince,
sorry I’m lost, does this still have to do with the I/O error of the previous post?
There are too many things going on, please help me help you by asking one specific question at a time :grinning_face_with_smiling_eyes:

In RDataFrame h = df.Filter('c==1').Histo1D('a.at(0)') is spelled h = df.Filter('c==1').Define("a0", "a.at(0)").Histo1D("a0").

Correct. I think you want: df.Define("a_sorted", "a[ArgSort(b)]") ` or similar. Docs are here.

You can always write a helper function or helper class that encapsulates this logic, and you can split the logic in multiple lines to make it more readable, using gdb or printouts to debug what’s going on.
You can have classes that do what you mention above.

Okey dokey,
So the I/O error coupled with your explanation of compiling the libraries suggested that writing a custom class/struct was more complex than I had anticipated.

I spent a good (embarrassingly) long while trying to use ACLiC or rootcling or ROOT.gInterpreter.GenerateDictionary() to work with this simple example but nothing I tried seems to allow me to snapshot these custom structures. The code I tried can be found below.

Given this complexity I’m tempted to abandon the idea of using this form of object based EDM and instead try to understand the construction of the trees within the RDF programming model. If so perhaps I open a new thread on how to create trees with the appropriate structure.

All the best and thanks again!

//photon_struct.h
#ifndef PHOTON_H
#define PHOTON_H
struct photon {
  float pt;
  float eta;
  float phi;
  float m;

  photon(float init_pt=-999.0, float init_eta=0.0, float init_phi=0.0, float init_m=0.0)
      : pt(init_pt), eta(init_eta), phi(init_phi), m(init_m) {}
};

#endif

and compiled

> root
root[0] .L photon_struct.h+

and then used in my code above

ROOT.gInterpreter.Declare('#include "photon_struct.h"')
df = df.Define('photons','ROOT::VecOps::Construct<photon>(photon_pt, photon_eta, photon_phi, photon_E)')

gives

TypeError: can not resolve method template call for 'Define'
error: use of undeclared identifier 'photon'

@pcanal little help :grinning_face_with_smiling_eyes: what’s the correct way to create and load a dictionary for the photon struct above and RVec<photon>?

and then used in my code above

Did you load the library?

Maybe use

ROOT.gSystem.Load("photon_struct_h.so"); # The ".so" part is optional but the underscore necessary

Hi Philippe,
yes after some work (my anaconda install of root refuses to compile macros it seems) I got to the same place as before. Now that I have compiled the header and loaded it, I can use it to create branches as before but still cannot snapshot them.

Photon_PY-2.pdf (49.8 KB)

Perhaps whilst we wait for @pcanal to get back on the shared library bit I can ask one of my others.

My idea for the classes was to try to make the programming logic simpler for me. To check that my logic is working correctly (in an event I want to loop over each jet and then each truth_jet to get the one with the smallest deltaR) I’d like to create a small data frame with my own events. I have the following test script in python:

### Helper functions and classes
class jet:
    def __init__(self,pt,eta,phi):
        self.pt = pt
        self.eta = eta
        self.phi = phi

    def set_match(self, match):
        self.matched = match
        
    def get_dr(self, seed):
        self.dr = dr(seed,self)

### (I know this isn't how dR works)
def dr(jet1,jet2):
    dphi = jet1.phi-jet2.phi
    deta = jet1.eta-jet2.eta
    return abs(dphi+deta)

### one event
reco_jets = [jet(1,1,1),jet(2,2,1)]
truth_jets = [jet(1,3,1),jet(2,1,1),jet(3,2,1)]
event1 = {'reco_jets':reco_jets,'truth_jets':truth_jets}

### analysis logic
#### for each jet
for j in event1['reco_jets']:
    #### find distances of this jet to all truth jets
    for k in event1['truth_jets']:
        k.get_dr(j)
    #### sort truth jets based on this value
    sorted_truth = sorted(event1['truth_jets'], key=lambda x: x.dr)

    #### if value is below a threshold return it
    if sorted_truth[0].dr < 2:
        j.set_match(sorted_truth[0])


### test logic worked
for j in event1['reco_jets']:
    print(j.matched.pt)

so I’d like to first know how to create a test data frame eg df = ROOT.RDataFrame([event1]) such that I can then work out how to create a vector of vectors needed to recreate the logic represented above.
(something along the lines of)

df0 = ROOT.RDataFrame(event1)
df = df.Define('drs',DeltaR(reco_eta,reco_phi,truth_eta,truth_phi))
df = df.Define('matched_truth','ArgSort(drs)')
df = df.Filter(drs<2)
df = df.Define('lead_matched','truth_pt[Take(matched_truth,1)]')
h = df.Histo1D('lead_matched')

…or whatever…

in any case my question is about building a simple data frame in order to test this logic and hopefully that gives me some insight into creating and sorting the members within.

Hi Philippe,
yes after some work (my anaconda install of root refuses to compile macros it seems) I got to the same place as before. Now that I have compiled the header and loaded it, I can use it to create branches as before but still cannot snapshot them.

[Photon_PY-2.pdf|attachment](https://root-forum.cern.ch/uploads/short-url/b94Rj1KDO7npR49b3PImGf3F6BM.pdf) (49.8 KB)

Maybe @eguiraud has a better answer.

It looks like you need to also generate the dictionary for RVec<photon> and/or vector<photon,ROOT::Detail::VecOps::RAdoptAllocator<photon> >

So adding to photon_struct.h:

#ifdef __ROOTCLING__
#pragma link C++ class vector<photon,ROOT::Detail::VecOps::RAdoptAllocator<photon> >+;
#pragma link C++ class ROOT::VecOps::RVec<photon>+;
#endif

and (of course) the corresponding headers:

#include "ROOT/RVec.hxx"
#include <vector>

should solve the problem.

Ah wonderful. Indeed now I don’t have any warnings when snapshotting the object however I do have issues when reading the outputted tree.

AttributeError: 'vector<photon,ROOT::Detail::VecOps::RAdoptAllocato' object has no attribute 'pt'

do I have to explicitly tell ROOT that this is an instance of my photon class?

image

Heres the full notebook as before.
Photon_PY.pdf (51.4 KB)

The intriguing part is that the class name is truncated. @etejedor Do you have any idea what’s going on?

Hi,

IIUC event.photons is a vector<photon>, and photon is what has the attribute pt. So I’d try e.g. for photon in event.photons: print(photon.pt).

1 Like

:man_facepalming:

ok yes, this worked. Obviously. Thanks @etejedor!

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.