Creating and Sorting Vectors of Collections

Howdy, quick question here (I think) I’m getting confused by my VecOps

I’m reading in a tree and creating a collection from it:

df1 = rdf.Define("l0", "ROOT::Math::PtEtaPhiMVector(lepton_pt.at(0,-999),lepton_eta.at(0,0),lepton_phi.at(0,0),0)")

as suggested in this older post

  1. First I’d like to be able to define a collection and push back multiple items into it, eg leptons containing l0, l1, l2 etc depending on the length of the RVec I’m reading from.

  2. Now from these RVec objects I’d like to make a selection without filtering the events. Eg, say in an event I have, l0(10,0,1,0) and l1(8,1,1,0) I’d like to plot the pt of the object with the highest pt in the event that also has a rapidity of l.Eta()>0 so 8 in this case. Can I sort the vector collection?

  3. Finally I’d like to extend the event data model, so rather than l0, and l1 being ROOT::Math::LorentzVectors they are lepton objects with members like l0.vector.Pt() = 0, l0.passesEtaSelection=False etc such that I can snapshot them.

I’m fairly sure each of these are possible but I’m trying to create a nice example and am struggling. I’ve written out a quick pseudo code in pyroot below.

Thanks again!
~/Vince

import ROOT

f = ROOT.TFile.Open('inputfile.root')

for event in f.tree:
    pt_hist = ROOT.TH1F("","",10,0,10)
    highest_pt = 0
    for lepton in event.leptons:
        if lepton.Eta()>0:
            if lepton.Pt()>highest_pt:
                highest_pt = lepton.Pt()
    pt_hist.Fill(highest_pt)

c = ROOT.TCanvas()
pt_hist.Draw()
c.SaveAs("pt_eta_passing.png")

and

import ROOT
rdf0 = ROOT.RDataFrame('tree', 'inputfile.root')
rdf1 = rdf0.Define('my_leptons','fill_leptons(leptons)')

#assuming 'fill_leptons' returns LorentzVectors
rdf2 = rdf1.Define('my_etaleptons','my_leptons.Eta() > 0')
#or with a custom flag
rdf2 = rdf1.Define('my_etaleptons','my_leptons.passesEtaSelection'

pt_hist = rdf2.Histo1D(("pt", "Lepton Collections", 16, 0, 4), "my_etaleptons.at(0).Pt()")
c = ROOT.TCanvas()
pt_hist.Draw()
c.SaveAs("pt_eta_passing.png")

Please read tips for efficient and successful posting and posting code

ROOT Version: 6.22/06
Platform: linuxx8664gcc
Compiler: anaconda


May be @eguiraud can help.

1 Like

Hi,

I can advice to write C++ function which will do all the complicated calculations and return you value what you want.

import ROOT
rdf0 = ROOT.RDataFrame('tree', 'inputfile.root')
rdf1 = rdf0.Define('my_leptons','fill_leptons(leptons)')

ROOT.gInterpreter.Declare('''
#include "Math/Vector3D.h"
#include "Math/Vector4D.h"
using namespace std;
using namespace ROOT::Math;
using namespace ROOT::VecOps;

double getMaxPt(const RVec <XYZTVector>& leptons){
    // find max Pt element with C++ vector method
    XYZTVector maxPtLepton = *max_element(leptons.begin(),
                                 leptons.end(),
                                 [](const XYZTVector& lep1, const XYZTVector& lep2) {return lep1.Pt() < lep2.Pt();} 
    return maxPtLepton.Pt();
}

''')

rdf2 = rdf1.Define('my_eta_selection', 'my_leptons.Eta() > 0')\
                .Define('my_new_leptons', my_leptons[my_eta_selection])
                .Define("max_pt_values", "getMaxPt(my_new_leptons)")

pt_hist = rdf2.Histo1D(("pt", "Lepton Collections", 16, 0, 4), "max_pt_values")
c = ROOT.TCanvas()
pt_hist.Draw()
c.SaveAs("pt_eta_passing.png")

Here are links for nice tutorial on how to work with C++ vectors with RDataFrame.
here1
here2

unfortunately I cant answer user-defined class part of the question

cheers

Hi @FoxWise, Thanks for this! I guess my question really wasn’t very clear…

Indeed I have some similar functions in there, borrowed exactly from the tutorials that you link (but rather I use them to find the index at which I extract the elements from the collection)

but…

What I really need is the function fill_leptons which you use here!

Hi Vince,
let’s tackle just point 1 first:

Is Construct what you are looking for?

Ok! Sorry for the late reply on this

yes ok Construct does seem to do what I want. Perfect.

df1 = df.Define('leptonTLVs','ROOT::VecOps::Construct<ROOT::Math::PtEtaPhiMVector>(lepton_pt, lepton_eta, lepton_phi, lepton_m)')
c = ROOT.TCanvas()
hpt = df1.Filter("nmuon==2").Define("pt", "leptonTLVs[0].Pt()").Histo1D("pt")
hpt.Draw()
c.Draw()

ok this gives me a list of the lepton 4 vectors. Perfect. Point one solved.

point 2 I think I can solve fairly easily in a number of ways, as demonstrated by @FoxWise. (Thanks again)

so then this takes me on to point 3. Which should be fairly trivial but unfortunately I’m not getting it to work.

ROOT.gInterpreter.Declare('''
class lepton
{
public:
  float pt;
  float eta;
  float phi;
  float m;

  lepton (float,float,float,float);
};


lepton::lepton(float init_pt, float init_eta, float init_phi, float init_m){
  pt = init_pt;
  eta = init_eta;
  phi = init_phi;
  m = init_m;
}
''')

gives me a simple lepton class (I’d like it with other decorators later but for now). Which works if I run:

l = ROOT.lepton(12.,1.,3.,0.2)
print(l.pt)

but not if I try to use it in conjunction with Construct:

df2 = df.Define('leptons','ROOT::VecOps::Construct<lepton>(muon_pt, muon_eta, muon_phi, muon_m)')
c = ROOT.TCanvas()
hpt = df2.Filter("nlepton>0").Define("pt", "leptons[0].pt").Histo1D("pt")
hpt.Draw()
c.Draw()

it fails when trying to run the graph. error: no matching constructor for initialization of 'lepton'

Probably a simple C++ error. Any *pointers?

lol xD

The compilation error doesn’t say what constructors were considered and why they were discarded?

This works (in C++) so I’m not sure what’s wrong in your case:

#include <ROOT/RVec.hxx>
using namespace ROOT::VecOps;

struct lepton {
  float pt;
  float eta;
  float phi;
  float m;

  lepton(float init_pt, float init_eta, float init_phi, float init_m)
      : pt(init_pt), eta(init_eta), phi(init_phi), m(init_m) {}
};

int main() {
  ROOT::RVec<float> muon_pt = {1.f, 2.f, 3.f};
  ROOT::RVec<float> muon_eta = {1.f, 2.f, 3.f};
  ROOT::RVec<float> muon_phi = {1.f, 2.f, 3.f};
  ROOT::RVec<float> muon_m = {1.f, 2.f, 3.f};
  auto leptons = Construct<lepton>(muon_pt, muon_eta, muon_phi, muon_m);

  return 0;
}

trying now with your definition of a lepton struct I get the same error:

In module 'ROOTVecOps':
/Users/vincentcroft/anaconda3/envs/root-env/include/ROOT/RVec.hxx:315:44: error: no matching constructor for initialization of 'lepton'
   RVec(pointer p, size_type n) : fData(n, T(), ROOT::Detail::VecOps::RAdoptAll...
                                           ^
/Users/vincentcroft/anaconda3/envs/root-env/include/ROOT/RDF/RColumnValue.hxx:205:18: note: in instantiation of member function 'ROOT::VecOps::RVec<lepton>::RVec' requested here
               T rvec(readerArrayAddr, readerArraySize);
                 ^
/Users/vincentcroft/anaconda3/envs/root-env/include/ROOT/RDF/RCustomColumn.hxx:73:67: note: in instantiation of function template specialization
      'ROOT::Internal::RDF::RColumnValue<ROOT::VecOps::RVec<lepton>
      >::Get<ROOT::VecOps::RVec<lepton>, 0>' requested here
      fLastResults[slot] = fExpression(std::get<S>(fValues[slot]).Get(entry)...);
                                                                  ^
/Users/vincentcroft/anaconda3/envs/root-env/include/ROOT/RDF/RCustomColumn.hxx:126:10: note: in instantiation of function template specialization
      'ROOT::Detail::RDF::RCustomColumn<__rdf::(lambda at input_line_122:2:16),
      ROOT::Detail::RDF::CustomColExtraArgs::None>::UpdateHelper<0>' requested here
         UpdateHelper(slot, entry, TypeInd_t(), ExtraArgsTag{});
         ^
/Users/vincentcroft/anaconda3/envs/root-env/include/ROOT/RDF/RCustomColumn.hxx:98:4: note: in instantiation of member function 'ROOT::Detail::RDF::RCustomColumn<__rdf::(lambda at
      input_line_122:2:16), ROOT::Detail::RDF::CustomColExtraArgs::None>::Update'
      requested here
   RCustomColumn(std::string_view name, std::string_view type, F expression, co...
   ^
/Users/vincentcroft/anaconda3/envs/root-env/include/ROOT/RDF/InterfaceUtils.hxx:424:11: note: in instantiation of member function 'ROOT::Detail::RDF::RCustomColumn<__rdf::(lambda at
      input_line_122:2:16),
      ROOT::Detail::RDF::CustomColExtraArgs::None>::RCustomColumn' requested here
      new NewCol_t(name, dummyType, std::forward<F>(f), cols, lm->GetNSlots(), ...
          ^
input_line_124:4:22: note: in instantiation of function template specialization
      'ROOT::Internal::RDF::JitDefineHelper<__rdf::(lambda at input_line_122:2:16) &>'
      requested here
ROOT::Internal::RDF::JitDefineHelper(__rdf::lambda4, {"leptons"}, "pt", reinter...
                     ^
input_line_119:2:8: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument,
      but 0 were provided
struct lepton {
       ^
input_line_119:2:8: note: candidate constructor (the implicit move constructor) not viable: requires 1 argument,
      but 0 were provided
input_line_119:8:3: note: candidate constructor not viable: requires 4 arguments, but 0 were provided
  lepton(float init_pt, float init_eta, float init_phi, float init_m)

Interestingly trying to parse your example into pyRoot also fails:

muon_pt = ROOT.RVec('double')((1.,2.,3.))
muon_eta = ROOT.RVec('double')((1.,2.,3.))
muon_phi = ROOT.RVec('double')((1.,2.,3.))
muon_m = ROOT.RVec('double')((1.,2.,3.))
my_leptons = ROOT.VecOps.Construct(ROOT.lepton)((muon_pt, muon_eta, muon_phi, muon_m))

with a:

TypeError: Template method resolution failed:
  Failed to instantiate "Construct(lepton)"

Ok this helps: as per the error message, RVec<lepton> needs a default constructor lepton().

The “Template method resolution failed” might or might not be a consequence of the problem above.

Beautiful! Very almost there now.

so I now have the following and it works

df = ROOT.RDataFrame(treename,filename)
ROOT.gInterpreter.Declare('''
struct lepton {
  float pt;
  float eta;
  float phi;
  float m;

  lepton(float init_pt=0.0, float init_eta=0.0, float init_phi=0.0, float init_m=0.0)
      : pt(init_pt), eta(init_eta), phi(init_phi), m(init_m) {}
};
''')
df2 = df.Define('leptons','ROOT::VecOps::Construct<lepton>(muon_pt, muon_eta, muon_phi, muon_m)')
c = ROOT.TCanvas()
hpt = df2.Filter("nmuon>0").Define("pt", "leptons.at(0).pt").Histo1D("pt")
hpt.Draw()
c.Draw()

ok. nice.

So last question(s):

  1. I know I can sort the collections using VecOps::Sort but how do I sort on elements? eg:
df3 = df2.Define('sorted_leptons','ROOT::VecOps::Reverse(ROOT::VecOps::Sort(leptons.pt))')

or

df3 = df2.Define('sorted_leptons','ROOT::VecOps::Reverse(ROOT::VecOps::Sort(leptons, [](lepton l){return l.pt}))')
  1. Above I use the .at(0) operator (as per your earlier advice) and have extended this now with a .at(0,lepton()) so the default constructor is useful. But in this case, I do a filter saying I need 2 leptons, then say that I also need these leptons to have a eta<2.4, but am now not sure that there are still two leptons? Can I define a filter that keeps checking? eg ROOT::VecOps::Take

  2. finally, is there any memory or performance issues associated with defining like 4 copies of filtered collections like this (leptons, filtered_leptons, sorted_leptons, leading_leptons)? Or is it better to write a single function that returns the 2 highest pt leptons that pass all selections?

  1. you can just try it out in a python or ROOT interpreter (also outside of a Define, simply playing with RVec<lepton> objects). I think the second alternative is what you want.

  2. sounds like you want to filter on eta before you filter on the number of leptons?

  3. a single function is probably going to be more performant, but make sure to not optimize prematurely

Okey dokey, I think I’ve almost got what I need but I ran into a few snags.

I made a notebook to demonstrate what I’m trying to do but my swan-fu has failed me today so here’s a pdf instead: Photon_PY.pdf (47.7 KB)

So what’s there. Ok making the structure is really nice and easy to do with the patent pending Guiraud method, but when it comes to snapshotting and using the files, our lovely objects are missing. I re-tried it using one of the public datasets to demonstrate and to check that it isn’t an issue with my data.

As for my other questions, I think I should be able to fix these myself (and probably should do that myself) but it will likely be inelegant solutions, which is my problem not yours, but will add them to my example as and when I manage.

Hi,
as per the error message, you need to generate dictionaries for your custom objects otherwise ROOT does not know how to write them, see e.g. here.

Cheers,
Enrico

Hi, thanks for looking at this again. I’m a little confused about this and maybe I’m vastly over complicating things.

The data used here has a particular Event Data Model (EDM) that includes x photons, y leptons and z jets. I’m not trying to read in an EDM from a dataset here, I’m trying to write/extend one. However I think there are two facets/approaches that I’m getting mixed up:

Data set to root trees.

If my simulation spits out 3 events each with an irregular length of objects:

event1: first(a=1,b=1,c=1), second(a=2,b=1,c=2)
event2: first(a=3,b=1,c=1)
event3: first(a=1,b=1,c=2), second(a=2,b=1,c=1), third(a=1,b=2,c=1)

how do I write that to a dataframe such that I I can read it like:

df =ROOT.RDataFrame("tree","my_file.root")
h = df.Filter('c==1').Histo1D('a.at(0)')

that gives a hist of 1,3,2.

furthermore how can I sort these events such that a.at(0) gives the object with the highest b value? as in event3: RVec a({1,2,1}), b({1,1,2}), c({2,1,1}) if I sort RVec b then RVec a remains unchanged no?

the struct from before allows us to associate the branches in an event to their respective elements but does not seem practical in terms of reading and writing the files.

Extending existing datasets

for example: in the tutorial here they use the index of the event to define a mask for goodphotons whereas I’d simply like to make a branch is_good_photon with the same length as photon_pt etc that I can use to create some more complex filters rather than combine masks.

The reason for this is sorting, eg if I want the invariant mass of each good_photon with the lepton that is closest to it in deltaR and then filter my events that the highest of these masses is close to a certain value. I have to create a vector of leptons with their deltaR to the first good_photon. Sort this list. and set the lepton-photon invariant mass. I’d then sort my good_photons by this invariant mass and reverse it to get the highest one. So Id have photons, good_photons, these classes could be fed a list of lepton objects to return the one with the smallest deltaR to give lepton_merged_good_photons and then sorted to give sorted_lepton_merged_good_photons This is a very pythonic way of thinking and I’m not sure it translates and possibly explains my question on efficiency.

With masks it would be define good photons, define deltaR, define closest, and then define Max(InvariantMass(photons[goodphotons] & leptons[closestdeltaR])) which is fine it’s just very difficult to parse mentally and to debug.

Hi Vince,
sorry I’m lost, does this still have to do with the I/O error of the previous post?
There are too many things going on, please help me help you by asking one specific question at a time :grinning_face_with_smiling_eyes:

In RDataFrame h = df.Filter('c==1').Histo1D('a.at(0)') is spelled h = df.Filter('c==1').Define("a0", "a.at(0)").Histo1D("a0").

Correct. I think you want: df.Define("a_sorted", "a[ArgSort(b)]") ` or similar. Docs are here.

You can always write a helper function or helper class that encapsulates this logic, and you can split the logic in multiple lines to make it more readable, using gdb or printouts to debug what’s going on.
You can have classes that do what you mention above.

Okey dokey,
So the I/O error coupled with your explanation of compiling the libraries suggested that writing a custom class/struct was more complex than I had anticipated.

I spent a good (embarrassingly) long while trying to use ACLiC or rootcling or ROOT.gInterpreter.GenerateDictionary() to work with this simple example but nothing I tried seems to allow me to snapshot these custom structures. The code I tried can be found below.

Given this complexity I’m tempted to abandon the idea of using this form of object based EDM and instead try to understand the construction of the trees within the RDF programming model. If so perhaps I open a new thread on how to create trees with the appropriate structure.

All the best and thanks again!

//photon_struct.h
#ifndef PHOTON_H
#define PHOTON_H
struct photon {
  float pt;
  float eta;
  float phi;
  float m;

  photon(float init_pt=-999.0, float init_eta=0.0, float init_phi=0.0, float init_m=0.0)
      : pt(init_pt), eta(init_eta), phi(init_phi), m(init_m) {}
};

#endif

and compiled

> root
root[0] .L photon_struct.h+

and then used in my code above

ROOT.gInterpreter.Declare('#include "photon_struct.h"')
df = df.Define('photons','ROOT::VecOps::Construct<photon>(photon_pt, photon_eta, photon_phi, photon_E)')

gives

TypeError: can not resolve method template call for 'Define'
error: use of undeclared identifier 'photon'

@pcanal little help :grinning_face_with_smiling_eyes: what’s the correct way to create and load a dictionary for the photon struct above and RVec<photon>?

and then used in my code above

Did you load the library?

Maybe use

ROOT.gSystem.Load("photon_struct_h.so"); # The ".so" part is optional but the underscore necessary

Hi Philippe,
yes after some work (my anaconda install of root refuses to compile macros it seems) I got to the same place as before. Now that I have compiled the header and loaded it, I can use it to create branches as before but still cannot snapshot them.

Photon_PY-2.pdf (49.8 KB)