so I thought it was the tracking of objects (to allow deletes in one language to propagate to the other), but that’s not it: it’s relatively expensive, yes, but it does nowhere near explain the 7x (which is what I see, not 6x like you do).
Same goes for the checking whether the returned object is a known object: relatively expensive in itself, but not the whole story.
Same goes for “ROOT.TVector3” which causes a lookup in ROOT every time.
Having no better ideas, and still only down to 5x, I ran oprofile, and it’s indeed a bundle of small(ish) contributions in everything from the malloc/free of the temporaries, to the dictionary overhead, to the TObject destructor.
Thus, cutting down the number of temporaries appears to be the most important:
r = ROOT.TVector3()
v = ROOT.TVector3()
for vec in vectors:
r.SetXYZ(vec, vec, vec)
v.SetXYZ(vec, vec, vec)
z = r.Cross(v)
There’s still a temporary in z, and the call overhead from the various function calls, but at least on my machine, this is 3x slower, not 7x.
What I should really do, and am planning for cppyy (I didn’t realize it was also this important for CPython), is to make full python objects for some important small classes such as TVector3, TLorentzVector, etc. In PyPy, the object-by-value returns are expensive for the same reasons, but relatively even more so: the difference becomes 30x!