Saving ROOT.Numba.Declare callables in Python?


Please read tips for efficient and successful posting and posting code

ROOT Version: 6.22.06
Platform: Debian 10
Compiler: Installed from Anaconda


To improve the performance of ROOT.RDataFrame in Python, I have been using a lot of JIT compilation of Python callables with numba. A simple example would be:

@ROOT.Numba.Declare(['RVec<double>', 'RVec<double>'], 'RVec<double>')
def get_rapidity(pz, E):
    return 0.5 * numpy.log((E + pz) / (E - pz))

df = df.Define('rapid', 'Numba::get_rapidity(pz, ene)')

The only problem is that there are many of these numba callables. So every time I run the script, a significant portion of the time will be spent to, I guess, “compile” these callables. I wonder if there is a way to “save” these things, so that the next time I run my script they do not have to be re-compiled again?

Thanks.

I guess @eguiraud can help you.

Hi @Fanurs ,
not exactly what you are asking for, but the closest thing would be to write the helper functions in a C++ file that you import with gSystem.CompileMacro("functions.cpp", "O") – this will skip recompilation when unneeded, and in a lot of cases the C++ code will look pretty much the same as the Python code thanks to RVec. This is similar to compiling Cython extensions or C extensions ahead of time, which is often done to speed up performance-sensitive paths of Python applications.

Cheers,
Enrico

Hi @eguiraud,

Thank you for your kind reply. Unfortunately, I do not have much experience dealing with libraries, so I may need some help to proceed further. I have included what I have made so far.


First, I create a file named “functions.cpp”:

#include "ROOT/RVec.hxx"
auto myfunc(ROOT::VecOps::RVec<double> &x) {
    return x * x;
}

To compile it, I enter a python session and do the following:

Python 3.8.5 (default, Sep  4 2020, 07:30:14) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import ROOT
>>> ROOT.gSystem.CompileMacro('functions.cpp', 'O')
Info in <TUnixSystem::ACLiC>: creating shared library /home/fanurs/learn/functions_cpp.so
1
>>> 

I believe so far everything is alright because I don’t see errors. Also, two files have been created at the current directory, namely,

  • functions_cpp_ACLiC_dict_rdict.pcm” and
  • functions_cpp.d”.

After this is where I got stuck. How can I invoke the myfunc() in another python script? I tried the following in a new python file, “test.py”:

import ROOT
ROOT.gSystem.Load('functions_cpp.d')
rdf = ROOT.RDataFrame('tree', 'data.root') # just some test file
rdf = rdf.Define('x_new', 'myfunc(x)')

And this is when I got an error that I couldn’t resolve:

input_line_66:2:62: error: use of undeclared identifier 'myfunc'
auto lambda0 = [](ROOT::VecOps::RVec<Double_t>& var0){return myfunc(var0)
                                                             ^
Traceback (most recent call last):
  File "test.py", line 4, in <module>
    rdf = rdf.Define('x_new', 'myfunc(x)')
cppyy.gbl.std.runtime_error: Template method resolution failed:
  ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void> ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void>::Define(basic_string_view<char,char_traits<char> > name, basic_string_view<char,char_traits<char> > expression) =>
    runtime_error: 
RDataFrame: An error occurred during just-in-time compilation. The lines above might indicate the cause of the crash
 All RDF objects that have not run an event loop yet should be considered in an invalid state.

Thanks,
Fanurs

Hi @Fanurs ,
argh sorry this did not work out of the box – it should! I can reproduce the problem, somehow CompileMacro does not produce functions_cpp.so. I will investigate.

In the meanwhile, we’ll have to be a bit more “manual”. Does this unblock you?

// compile functions.cpp into a shared library
$ g++ -c -fPIC -o functions.so functions.cpp $(root-config --libs --cflags)
$ python
>>> import ROOT
>>> ROOT.gInterpreter.Declare('#include "functions.cpp"')
>>> ROOT.gSystem.Load("functions.so")
>>> print(ROOT.myfunc) # should print something reasonable

Cheers,
Enrico

Update: I forgot you need to tell CompileMacro that you want the shared library to stay around, with the `k’ option:

ROOT.gSystem.CompileMacro("functions.cpp", "kO")

and then:

>>> import ROOT
>>> ROOT.gInterpreter.Load("functions_cpp.so")
0
>>> ROOT.myfunc
<cppyy.CPPOverload object at 0x7f02450e3970>

ROOT.gInterpreter.Declare('#include "functions.cpp"') should not be needed, if the location of functions.cpp does not change between when you generate the shared library and when you use it (because ROOT knows where to find the source code corresponding to the shared library thanks to those other files it writes out, functions_cpp_ACLiC_dict_rdict.pcm and functions_cpp.d).

Cheers,
Enrico

Thanks a lot! It now works perfectly. I really appreciate your help. :slight_smile:

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.