Python with ROOT.Math libraries is very slow on MAc

ROOT Version: 6.24/04
Platform: OSX 20.26.7, Catalina
Compiler: precompiled installation

I noticed that my python script with ROOT libraries is running much slower on Mac (10.16.7, Catalina) then on RedHat7.
I found that it is connected with using ROOT.Math library. I wrote simple script to test it.

#!/usr/local/bin/python3
#import sys, ROOT, math
#import ctypes
import math
import ROOT
from   ROOT import  gBenchmark
n=0
gBenchmark.Start('lund')
for i in range(50000):
    n=n+1
# 1    sqrt3=math.sqrt(2.)
# 2    sqrt2=ROOT.Math.sqrt(2.)
    if n%10000 ==0:
          print('n # ',n)
gBenchmark.Show('lund')

There are 2 operators in loop that are calculating just sqrt(2.).
I made 3 timing measurements using gBenchmark.

  1. Without both of them – 0.01 seconds
  2. sqrt3=math.sqrt(2.) -Only this operator, very fast, ~0.01 seconds
  3. sqrt2=ROOT.Math.sqrt(2.) – Only this operator, very slow ~12 secons, 1200 times!!! slower than #2.
    The result doesn’t change much if I am using homebrew, port or direct from cern.root website.
    I have old Mac laptop with ROOT 6.24/00, python 2.7.16. The same script is working very fast.
    I contacted with some IT experts. They don’t ay me much exsperts that there is definitely some problem.

May be @etejedor can help

Hello,

Is this reproducible also in C++?

Random guess, but as I expect math.sqrt and ROOT.Math.sqrt to both be optimized to death, I think what you might be measuring is the overhead of calling C++ functions via PyROOT compared to calling standard Python C extension modules (I don’t know why that might be slower on Mac than on RedHat).

You can try importing sqrt from ROOT.Math and calling it directly from the loop:

import ROOT
from ROOT.Math import sqrt as root_sqrt

for i ...
   r = root_sqrt(2.)

4 posts were split to a new topic: import ROOT is slow

I did not test it myself but my friend did and said that c++ works fast.

It is not the case. I have another laptop with python2.7 and ROOT v6.14.
It works very fast:
math.sqrt(2.). 0.02 sec
ROOT.Math.sqrt(2.) 0.2 sec

As I wrote with with python3.9.6 and ROOT v6.24
math.sqrt(2.). 0.02 sec
ROOT.Math.sqrt(2.) 12.8 sec

I did it but got the error message
sqrt3=root.sqrt(2.)
NameError: name ‘root’ is not defined

In my case adding the line
from ROOT import TFile
does nothing. CPU time is the same

Yes I understand, my point was that it’s probably not ROOT’s implementation of sqrt that’s slow(er), but the machinery required to call the C++ sqrt function from Python, sorry if that’s just obvious.

should be root_sqrt, same as the name you imported the function as

Sorry for typo.
Yes, this helps, 0.06 seconds instead of 13 seconds.
This is interesting suggestion and observation.
It was just a example. In the real program I am using another packages.
ROOT.Math.PxPyPzMVector
Do I need to import all of them one by one?

So it seems that it was the lookups which consumed the time (ROOT.Math and ROOT.Math.sqrt). It might be then good to just import what you need before using it in a hot loop.

@valkuba Yes, in old ROOT.py, lookup results on the facade were cached, whereas in “new” PyROOT, successful top-level lookups no longer are, hence the difference you observe between those two ROOT versions. I.e. it’s a choice (bug) in the new approach, not a Mac-thingy (or even Python-thingy).

To get the old speed back, you only ever need to do ‘from ROOT import Math’ however, since ‘Math’ is bound by cppyy, which does cache successful lookups, so Math.sqrt will be fast (even as, yes, just sqrt will be even faster, but the same is true for use of sqrt from math.sqrt).

(Aside, on my Linux box, using **0.5 outperforms all and cppyy.gbl.std.sqrt has an additional slowdown of 25% b/c it’s a templated function, which results in an extra internal lookup.)

The remaining performance differences between Math.sqrt and math.sqrt you’re left with are:

  1. wrapper generation for cppyy on first call
  2. Math.sqrt being an overloaded function whereas math.sqrt only works on double
  3. For python3: math.sqrt benefiting from an optimized call API, which isn’t available (in alternate form) to cppyy until python3.8 and only actually used as of cppyy 2.1.0.

And yes, sqrt is my favorite function when trying to understand call overhead :slight_smile:

1 Like

Should we fix this in ROOT.py?

You mean to make old PyROOT equally slow by removing the caching? Personally, I would leave old ROOT.py well alone and instead opt to add caching to the new ROOT/_facade.py, as a way to equalize old and new behavior. Two orders of magnitude for a common use case is nothing to sneeze at and fixing ROOT/_facade.py is certainly easier than asking folks all over to change a decade and a half worth of legacy codes. Besides, recommendations in preference of from X import Y over import X; X.Y have varried over time. Personally, I used to like the former but now prefer the latter b/c of Jupyter notebooks.

Thank you very much for your suggestion.
It really works. But I am still wondering why I don’t need to make such a trick
with my old laptop where I have python2.7 and ROOT v6.14?

Thanks again for your help,
Valery

It is very impressive explanation. I cannot say that I understood details due to education in physics, not computer science.

unfortunately does not work for me actually.What really helps is

from ROOT.Math import sqrt as root_sqrt

and then you have to edit all calls to ROOT.Math functions in your script.
As you understand sqrt is just an example. In reality the program contains many other calls to ROOT.Math.

s:ROOT.py:ROOT/_facade.py in my comment above, of course… :smiley:

Thanks for the explanation!