TVector3 performance question

goettf · December 12, 2012, 1:50pm

Hi there,
We were wondering whether some TVector algebra is (speed-wise) superior to written-out calculations in plain python and were astonished to find that the first part of the script below is ~6x faster than the second part. Can anybody tell why this is? Is calling the ROOT.TVector3() constructor so expensive?

Cheers
Luisa & Tobias
test_TVector3.py (1.08 KB)

wlav · December 12, 2012, 7:48pm

Hi,

so I thought it was the tracking of objects (to allow deletes in one language to propagate to the other), but that’s not it: it’s relatively expensive, yes, but it does nowhere near explain the 7x (which is what I see, not 6x like you do).

Same goes for the checking whether the returned object is a known object: relatively expensive in itself, but not the whole story.

Same goes for “ROOT.TVector3” which causes a lookup in ROOT every time.

Having no better ideas, and still only down to 5x, I ran oprofile, and it’s indeed a bundle of small(ish) contributions in everything from the malloc/free of the temporaries, to the dictionary overhead, to the TObject destructor.

Thus, cutting down the number of temporaries appears to be the most important:r = ROOT.TVector3() v = ROOT.TVector3() for vec in vectors: r.SetXYZ(vec[0], vec[1], vec[2]) r.SetMag(1.) v.SetXYZ(vec[3], vec[4], vec[5]) v.SetMag(1.) z = r.Cross(v)

There’s still a temporary in z, and the call overhead from the various function calls, but at least on my machine, this is 3x slower, not 7x.

What I should really do, and am planning for cppyy (I didn’t realize it was also this important for CPython), is to make full python objects for some important small classes such as TVector3, TLorentzVector, etc. In PyPy, the object-by-value returns are expensive for the same reasons, but relatively even more so: the difference becomes 30x!

Cheers,
Wim