Do not recompute the pseudorapidity every time it is read

jndrf · December 3, 2018, 2:40pm

Hi everyone,

recently I profiled my code to optimize it and found the by far biggest consumer of computation time to be TVector3::PseudoRapidity(), which is called from TLorentzVector::Eta(), see attached screenshot.

(read: 55 % of the whole computation time is spent inside TVector3::PseudoRapidity, 9% of the whole computation time directly in the method, everything else in methods called from there)
This is because the pseudorapidity is recomputed every time from some properties of the vector with an expensive logarithm function.

I propose to calculate eta (as well as the other coordinates) only when coordinates are changed, e.g. via TLorentzVector::SetPtEtaPhi() and to save them as member variables, which are then directly returned by the getter methods.
This would increase the performance of the TVector3::PseudoRapidity() method by several orders of magnitude.

Since eta is one of the coordinates used in particle physics, all analyses will have a significant performance gain by this change.

Cheers,

jndrf

sbinet · December 3, 2018, 3:34pm

This is the usual dilemma between optimizing memory and optimizing CPU.

You seem to opt for CPU.
Everybody may not flock that way.
Especially b/c CPU is cheap, memory not so much.
Adding another data member will probably have an inpact on both the memory footprint of a bunch of data structures but also (and probably a bigger one) on cache lines, decreasing performances at runtime.

In ATLAS, to tackle this issue, it was decided to have a bunch of dedicated structures:

PxPyPzE
PtEtaPhiM
EEtaPhiM
IPtCotThPhiM

each of which may make more sense than the others for some use case.
see: http://acode-browser1.usatlas.bnl.gov/lxr/source/athena/Event/FourMom/FourMom/

IMHO, it would make more sense for you to write a dedicated type – not unlike the EEtaPhiM one or PtEtaPhiM from ATLAS – to address your particular use case than trying to have TLorentzVector be everything to everyone.

just my 2 euro cents.

jndrf · December 3, 2018, 6:52pm

Interesting, I never thought about that level. As a workaround, I now use an overload of TLorentzVector::Eta() and SetPtEtaPhiM() that roughly doubles the throughput.
If the argument about the cache lines still holds, it should be explicitly mentioned in the documentation that TLorentzVector::Eta() is slow and one should rather use a custom class overloading the TLorentzVector.

Probably. But it should be optimized for the most common use case – and I can’t imagine that cartesian coordinates are the most common case.

sbinet · December 3, 2018, 7:53pm

IIRC, cartesian coordinates are mostly used by generators. At least it used to be the case.

There are so many ways to skin a cat

Axel · December 4, 2018, 12:57pm

Please consider using ROOT::Math::PtEtaPhiEVector, see https://root.cern/doc/master/LorentzVectorPage.html

jndrf · December 10, 2018, 10:28am

Hi Axel,

I did so. It turned out that I had to tell cmake explicitly to link GenVector, it was not included in ${ROOT_LIBRARIES}, I use ROOT 6.14.

Cheers,

jndrf