RDataFrame TLorentzVector Basic VecOps?

Howdy folks, I’ve been struggling with these for a while and figured it might be easier and useful here.

Imagine I have a file with events that contain two containers leptons and jets. there can be x leptons and y each container contains the inputs for a TLorentzVector Pt, Eta, Phi, M

accessed as:

for event in file.tree:
    nleptons = 0
    for lepton in lepton_collection:
        if lepton.pt  > 10:
            nleptons += 1
    for jet in jet_collection:
        if jet.pt > 30
            njets += 1

I can get the invariant mass of all leptons with the super useful invariantmass(lepton.Pt,lepton.Eta,lepton.Phi,lepton.M) command. This is the functionality I see in the tutorials that id like to understand.

How do I define the pt of the TLorentzVector of the two highest pT leptons? eg in the crappy non-RDataFrame way, if they are already pre-sorted would look like:

l0 = TLorentzVector(0,0,0,0)
l0.SetPtEtaPhi(lepton[0].Pt,lepton[0].Eta,lepton[0].Phi,lepton[0].M)
l1 = TLorentzVector(0,0,0,0)
l1.SetPtEtaPhi(lepton[1].Pt,lepton[1].Eta,lepton[1].Phi,lepton[1].M)
return (l0+l1).Pt()

this question extends to moving between containers e.g. working out the angular distance between the highest pT lepton and the second highest pT jet but what about calculating the possible angles between a lepton and the jets in the event in order to select the smallest one? I guess this question also includes defining TLorentzVectors as objects in the Tree and being able to sort them… but you are the experts.

I think that if I can grasp this concept I would never use root without RDF but until then I end up searching the example directory daily then either writing it as a loop inside my definitions or or giving up and doing it the old fashioned way.

Hi Vince,
RDataFrame is just the scheduler for the operations you want done in the event loop. When you do operations on arrays like with invariantmass, you are leveraging RVec/VecOps (since RDF reads arrays and standard vectors as RVec’s – but in principle you can use each without the other).

I’m reading your question as: I have arrays in my tree, what is the nicest syntax you have to do common operations on them? The current answer is: with RVec’s and their helper functions, and when that fails, by writing your own helper functions. Maybe one day we’ll have a proper grammar that is generic and nice enough that users almost never have to write their own helpers, as it often is the case when using numpy arrays.

For example, among the RVec helper functions there is an InvariantMasses function that might help with your usecase (if nothing else, as an example: the implementation is here). If there is no helper function that solves your usecase, typically you’d write your own, stick it in a header and use it at will (and if you feel you are solving a common use case, you can open a pull request to add your helper function to VecOps to the benefit of everyone else).

Hope this helps!
Pinging @swunsch and @StephanH (although I think they are traveling at the moment) because they are usually interested in these kind of programming model discussions.

Cheers,
Enrico

Hmmm yes indeedie do. I shall add a RVec/VecOps tag to the post.

I like the InvariantMasses function (as in my original post) but am struggling to understand the syntax. Could someone point me to an example of eg. finding the Pt of the two highest Pt jets in VecOps land?

One final RDF question. What about defining TLorentzVectors that can be outputted to a slimmer file. Currently I do

rdf.define("pt0",jet[0].Pt())
rdf.define("eta0",jet[0].Eta())
rdf.define("phi0",jet[0].Phi())
rdf.define("m0",jet[0].M())
rdf.define("pt1",jet[1].Pt())
rdf.define("eta1",jet[1].Eta())
rdf.define("phi1",jet[1].Phi())
rdf.define("m1",jet[1].M())

then add all these names to a vector then snapshot each and every one. It makes for a very ugly code. Can I just define a TLV and snapshot that?

Yes you can Define a column of lorentz vector type! And you can Snapshot it!

But please consider using a ROOT::Math::LorentzVector - that’s much faster. And if you select the right coordinate system (e.g. eta phi m) it also stores with better compression.

Axel.

Do you have an example ?

Tancredi