Libraries with distributed RDFs and Spark


I would like to use custom C++ libraries on Spark workers to calculate values to an RDF. In my EOS I have the .so files and the dictionaries that I created in ROOT with .L and for example locally I would do something along the lines of


function_code = """
float myJittedFunc(float a) {
myClass b; // this class would be defined in the libraries
return b.evaluate(a)


df.Define("b", "myJittedFunc(a)")

So essentially, how should I give information about the path to my files, load the library and then use JIT compiled code that’s based on the loaded library with Spark workers?

ROOT Version: 6.26/08
Platform: SWAN/LCG: 102b/K8s
Compiler: gcc11

Hi @toicca,

In the reply below, I’m assuming that Spark executors are running on different worker nodes.

I think that for this to work, the generated .so file (and possible the headers, if used directly in your interpreted code) should be reachable from within the filesystem in the worker nodes. If your Spark workers are mounting a common network filesystem (e.g., NFS), this should be the better location for placing these files as they will be reachable by all the nodes. Then, your code should reference the absolute paths, as in:

If you are not mounting a network filesystem on each worker node, probably your only solution is to manually copy those files under the same location, e.g. under /opt/your/files/ or /home/you/.local/.

I’m also inviting @vpadulan for him to be aware of this topic.


There’s experimental support for shipping the shared libraries to the workers. It assumes all the environments are coherent on all machines, in the client as in all the workers. Fortunately, doing this on SWAN already should make sure that this is the case.
I will update you with a self-contained example.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.