Example app using TPySelector and rootpy

Hello,
Since it took me quite a while to get everything to work I thought that I put out an example how one may use the TPySelector (running python programs with proof-lite) locally.
Admittedly, the example does not run since it requires root files with ALICE data where I am not sure whether I can put them online. However, I think the code is clear enough to see what is going on. I hope it may save some people a lot of time.
Its all here: https://github.com/chrisboo/TPyselector_example

Everything works very nice actually, I only wonder if anyone has experience with pyroot and cython?
For example if one would like to replace some for loops like the following with cython:

for phi, eta, zvtx in tracks:
    a = array('d', [phi, eta, zvtx]) 
    self.trigger_hist.Fill(a)

where ‘tracks’ is a list of tuples with (phi, eta, zvtx). The problem is how to efficiently call the (python) function to fill the histogram?

Cheers!

Hi,

thanks for posting!

About connecting with cython: am I right to assume that trigger_hist is a pure C++ ROOT histo? C++ support in cython has been growing significantly over the past couple of years, so passing the pointer and compiling in the call might be an option?

Cheers,
Wim

Thanks for your prompt reply!
Well, in the code posted trigger_hist is a normal histogram from pyroot, in this case THnSparseD. My problem is that I do not know how to declare that histogram type in cython. Naively, I would do something like this (I should say that I have never worked with cython, though):

Create fill_histogram.pyx in the same folder as selector.py:

# fill_histogram.pyx
from ROOT import THnSparseD
cimport numpy as np

cdef fill_histogram(THnSparseD trigger_hist, tracks):
    cdef double phi, eta, zvtx
    cdef np.ndarray[np.float64, ndim=1] a = np.empty(3, dtype=np.float64)
    for phi, eta, zvtx in tracks:
        a[0] = phi
        a[1] = eta
        a[2] = zvtx
        trigger_hist.Fill(a)
#### selector.py  ####
import pyximport; pyximport.install()
from fill_histogram import fill_histogram
....
def Process(self, entry):
    # do stuff....
    fill_histogram(self.trigger_hist, tracks)

But the type THnSparseD is unknown to cython. I’d be really grateful if you could provide a few more details on how this could be done.

Thanks a lot!

Hi,

the histogram type needs to be known as such a type to cython, which is more work (see here). I have never tried this, but the inheritance hierarchies of ROOT are pretty straightforward, so I’d assume that it works.

Re-reading it, the documentation does not actually talk about functions that just take a pointer type to a now known C++ class. If that doesn’t work, then another idea could be to use a void* or other simple argument, then cast explicitly once the type is known to cython.

(You can use ROOT.AddressOf() to turn the histogram into an addressable buffer or index that buffer directly to get a long value representing the memory address. There is also ROOT.AsCObject() to create a PyCObject that points to the histogram.)

Cheers,
Wim

P.S. Of course, what I should really do is make TPySelector work in PyPy, but I’ve been having still unsolved problems (dead-locks and crashes) with callbacks.

Thanks again Wim,
pypy would be really great of cause! But for the time being, could something like the following work:
(Copied together from http://stackoverflow.com/questions/18154361/cython-c-example-fails-to-recognize-c-why and http://stackoverflow.com/questions/16993927/using-cython-to-link-python-to-a-shared-library)

# setup.py
from distutils.core import setup
from Cython.Build import cythonize

setup(ext_modules=cythonize(
    "fill_histogram.pyx",            # our Cython source
    sources=["path_to_THn_source_files"],  # additional source file(s)
    language="c++",          # generate C++ code
    libraries=["Hist"],      # refers to "libHist.so" or whatever is appropriate
    extra_compile_args=["-fopenmp", "-O3"],  # set complier options here
    extra_link_args=["-DSOME_DEFINE_OPT", "-L./some/extra/dependency/dir/"]
))
# fill_histogram.pyx
cdef extern from "THnSparseD.h" namespace "histograms":
cdef cppclass THnSparseD:
    int Fill(double)

cdef fill_histogram(THnSparseD *trigger_hist, tracks):
    cdef double phi, eta, zvtx
    cdef np.ndarray[np.float64, ndim=1] a = np.empty(3, dtype=np.float64)
    for phi, eta, zvtx in tracks:
        a[0] = phi
        a[1] = eta
        a[2] = zvtx
        trigger_hist.Fill(a)

I would not have a clue about the compiler options, libraries and sourcefiles in setup.py, though…

Cheers,
Christian

Hi,

the needed include dirs, compiler flags, etc. can be easily picked up from “root-config” (do a “root-config --help” for the options; you can call it programmatically using e.g. the “commands” module). I would expect that cython adds the needed options for Python (if not, you can find them from module “distutils.sysconfig”).

I don’t think you need to link with libHist (or any other ROOT libs), though, as that should already by loaded at the point where the cython extension gets loaded. (Wouldn’t think it hurts, though.)

Cheers,
Wim

I feel like I am almost there! I started of trying with TH1D for simplicities sake. The following compiles and runs, but I do not know how to cast the python histogram to the cpp histogram. Here is what I have got:

# distutils: language = c++
# fill_histogram.pyx

from ROOT import AddressOf
import cython

cdef extern from "TH1D.h":# namespace "TH1D":
    cdef cppclass TH1D:
        TH1D()
        int Fill(double)
        char* GetName()

def fill_histogram(hist_in):
    # hist = AddressOf(hist_in)  # <-- Does not work
    print "Finished fill_histogram"
import os
import shutil
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext

# clean previous build
for root, dirs, files in os.walk(".", topdown=False):
    for name in files:
        if (name.startswith("fill_histogram")
                and not(name.endswith(".pyx") or name.endswith(".pxd"))):
            os.remove(os.path.join(root, name))
    for name in dirs:
        if (name == "build"):
            shutil.rmtree(name)


setup(
    cmdclass={'build_ext': build_ext},
    ext_modules=[Extension(
        "fill_histogram",                        # name of the extension
        sources=["fill_histogram.pyx"],  # additional source file(s)
        language="c++",          # generate C++ code
        include_dirs=["/home/christian/root/include", ],
        library_dirs=["/home/christian/root/lib", ],
        libraries=["python2.7", "Core", "Cint", "RIO", "Net", "Hist", "Graf",
                   "Graf3d", "Gpad", "Tree", "Rint", "Postscript",
                   "Matrix", "Physics", "MathCore", "Thread",
                   "pthread", "m", "dl"],
        # set complier options here:
        extra_compile_args=["-I/home/christian/root/lib"],
    )])
# main.py
# do
# $ export LD_LIBRARY_PATH=path/to/root/lib
# befor running with:
# $ python main.py
from fill_histogram import fill_histogram
from ROOT import TH1F, AddressOf

hist = TH1D()
ptr = AddressOf(hist)  # of type PyLongBuffer or somthing
fill_histogram(hist)

The problem now is in the fill_histogram.pyx (first snippet).
How do make the cython histogram ´hist´ point to the python object ´hist_in´? The commented line

hist = AddressOf(hist_in) did not work. Nor the cython version of it called cython.address()

Any hints? I think getting those loops into pure C++ code (via cython) would potentially deliver a massive speed up, or am I mistaken?

Christian,

this line: cdef TH1D *hist = new TH1D()would not work, as it allocates a new histogram. I was thinking that something along these lines should work (but haven’t tried):h = ROOT.TH1D() cy_h = <TH1D> ROOT.AddressOf(h)[0]
Alternatively (and again haven’t tried), cython may be able to deal with a PyCObject, from:ROOT.AsCObject(h)or another equivalent void* type from AddressOf(h)[0]. (It’s not clear to me from the code what type fill_histogram is takes.)

Cheer,
Wim

Thanks yet again for the prompt reply Wim!
I cleaned up my former post since it was inconsitently using TH1D and F as well as including debug code. I reduced the code to the bare minimum, so in the following, the hist_in is not actuall used at all. I try to make it compile with your suggestions first:

def fill_histogram(hist_in):
    h = ROOT.TH1D()
    cy_h = <TH1D> ROOT.AddressOf(h)[0]
def fill_histogram(hist_in):
    h = ROOT.TH1D()
    cy_h = <TH1D> ROOT.AsCObject(h)

However, both of them fail with the same compilation error (full error at the end):

.....
error: no matching function for call to ‘TH1D::TH1D(PyObject*&)’
   __pyx_v_cy_h = ((TH1D)__pyx_t_3);
                         ^
fill_histogram.cpp:762:25: note: candidates are:
In file included from /home/christian/root/include/TH1D.h:25:0,
                 from fill_histogram.cpp:337:
/home/christian/root/include/TH1.h:581:4: note: TH1D::TH1D(const TH1D&)
    TH1D(const TH1D &h1d);
    ^
/home/christian/root/include/TH1.h:581:4: note:   no known conversion for argument 1 from ‘PyObject* {aka _object*}’ to ‘const TH1D&’
....

Is there any possibility to forcefully make a PyObject* (yes, even the ROOT.AsCObject(h) yield this error) to a TH1D*?

Cheers,
Christian

-*- mode: compilation; default-directory: "~/cython_test/" -*-
Compilation started at Wed Jan 29 01:19:53

python setup.py build_ext --inplace
running build_ext
cythoning fill_histogram.pyx to fill_histogram.cpp
building 'fill_histogram' extension
creating build
creating build/temp.linux-x86_64-2.7
x86_64-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/home/christian/root/include -I/usr/include/python2.7 -c fill_histogram.cpp -o build/temp.linux-x86_64-2.7/fill_histogram.o -I/home/christian/root/lib
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ [enabled by default]
fill_histogram.cpp: In function ‘PyObject* __pyx_pf_14fill_histogram_fill_histogram(PyObject*, PyObject*)’:
fill_histogram.cpp:762:25: error: no matching function for call to ‘TH1D::TH1D(PyObject*&)’
   __pyx_v_cy_h = ((TH1D)__pyx_t_3);
                         ^
fill_histogram.cpp:762:25: note: candidates are:
In file included from /home/christian/root/include/TH1D.h:25:0,
                 from fill_histogram.cpp:337:
/home/christian/root/include/TH1.h:581:4: note: TH1D::TH1D(const TH1D&)
    TH1D(const TH1D &h1d);
    ^
/home/christian/root/include/TH1.h:581:4: note:   no known conversion for argument 1 from ‘PyObject* {aka _object*}’ to ‘const TH1D&’
/home/christian/root/include/TH1.h:580:4: note: TH1D::TH1D(const TVectorD&)
    TH1D(const TVectorD &v);
    ^
/home/christian/root/include/TH1.h:580:4: note:   no known conversion for argument 1 from ‘PyObject* {aka _object*}’ to ‘const TVectorD& {aka const TVectorT<double>&}’
/home/christian/root/include/TH1.h:579:4: note: TH1D::TH1D(const char*, const char*, Int_t, const Double_t*)
    TH1D(const char *name,const char *title,Int_t nbinsx,const Double_t *xbins
    ^
/home/christian/root/include/TH1.h:579:4: note:   candidate expects 4 arguments, 1 provided
/home/christian/root/include/TH1.h:578:4: note: TH1D::TH1D(const char*, const char*, Int_t, const Float_t*)
    TH1D(const char *name,const char *title,Int_t nbinsx,const Float_t  *xbins
    ^
/home/christian/root/include/TH1.h:578:4: note:   candidate expects 4 arguments, 1 provided
/home/christian/root/include/TH1.h:577:4: note: TH1D::TH1D(const char*, const char*, Int_t, Double_t, Double_t)
    TH1D(const char *name,const char *title,Int_t nbinsx,Double_t xlow,Double_
    ^
/home/christian/root/include/TH1.h:577:4: note:   candidate expects 5 arguments, 1 provided
/home/christian/root/include/TH1.h:576:4: note: TH1D::TH1D()
    TH1D();
    ^
/home/christian/root/include/TH1.h:576:4: note:   candidate expects 0 arguments, 1 provided
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

Compilation exited abnormally with code 1 at Wed Jan 29 01:19:57

Christian,

ah, that was not what I expected the code to do … If the cast only receives a PyObject*, then that’ll never work as such (in fact now that I see the generated code, the cast should also not be to a TH1D, but to a TH1D*, as copying should be avoided). Just for reference, the pointer to the histogram (when bound by PyROOT), lives right after PyObject_HEAD, so in theory, following the cython docs on how to use C extension types by defining such a type with as first member a TH1D* should work, too. However, I think that even at that point, the casting problem will remain (but just one of casting to the extension type).

I’ve been looking through the cython documentation, but did not find if it is possible to provide user-defined casts. All that is needed really is a cython-defined function that takes a void* or long, and does:TH1D* cast_th1d(long h) { return (TH1D*)h; }Maybe just defining such a helper to cython and calling it explicitly would do?

Another idea is this:cdef void* p p = ROOT.AddressOf(h)[0] # assuming here that long -> void* is fine # and does not just take the PyObject* address cy_h = <TH1D*>pDunno whether it will like that last cast. If the former fails to properly initialize p, but the cast works, then perhaps “cdef long p” would work instead.

Cheers,
Wim

It works, thanks a lot!
The cdef long p did the trick, here is the entire fill_histogram.pyx:

# distutils: language = c++
# fill_histogram.pyx

from ROOT import AddressOf
import ROOT
import cython


cdef extern from "TH1D.h":# namespace "TH1D":
    cdef cppclass TH1D:
        TH1D()
        int Fill(double)

def fill_histogram(hist_in):
    cdef long p = ROOT.AddressOf(hist_in)[0]
    cy_h = <TH1D*> p
    cdef i
    for i in range(100):
        cy_h.Fill(1.0)

And here are some the results for looping over the Fill function 100 times, once in cython, once in python:

# main.py
# do
# $ export LD_LIBRARY_PATH=/home/christian/root/lib
# befor running with:
# $ python main.py
from fill_histogram import fill_histogram
from ROOT import TH1F
import timeit

def dumb_fill():
    hist = TH1F("test", "test", 10, 0, 10)
    for i in range(100):
        hist.Fill(1.0)
    #print hist.Integral()

def smart_fill():
    hist = TH1F("test", "test", 10, 0, 10)
    fill_histogram(hist)
    #print hist.Integral()

if __name__ == '__main__':
    print 'Pure python:'
    print timeit.timeit("dumb_fill()", number=10,
                        setup="from __main__ import dumb_fill")
    print 'With cython:'
    print timeit.timeit("smart_fill()", number=10,
                        setup="from __main__ import smart_fill")
Pure python:
0.00605702400208
With cython:
0.000319957733154

I would say this is a very neat improvement! Now I just have to make it work with the slightly more complicated case of THnSparseD when I find time.

Thanks again Wim for the help!

Hi Christian,

That’s a nice speed improvement. If you want to see other code that uses ROOT and cython you might find some of the source code in https://github.com/rootpy/root_numpy useful.

(specifically in https://github.com/rootpy/root_numpy/tree/master/root_numpy/src)

cheers!
Noel

Thanks for the tip, Noel! (and thanks for rootpy!)
Unfortunately, I am rather swamped right now and have to admit that performance improvements of my code have a rather low priority at the moment. That being said, I can only recommend using the TPySelector to everyone running a local analysis! Especially if one were to keep possible cython optimization in mind right from the start (eg. generators do not work in cython).