C++ 2D Array in PyROOT

jfcaron · June 9, 2014, 8:11pm

Hi, I found a few posts indicating that this was difficult, but I didn’t find a standard solution.

I have a Geom.C file that defines a large const Double_t[][2] like this:

namespace Geom{
  const Double_t FieldPos[][2] = { 
    {-0.7,-0.0},{-0.7,-1.4},{-0.7,-2.1},{-0.7,-2.8},
    ...
    };
}

I can access this in PyROOT by doing ROOT.gROOT.ProcessLine(".L Geom.C+"), then ROOT.Geom.FieldPos. Unfortunately it is shown as being a PyDoubleBuffer with no information about the length or dimensionality. In fact, the len() function tells me it’s of length 177, which is patently false (it should be even by construction).

Is there a way to obtain a Pythonic 2D object from this? Do I have to manually count how many values there are, then hand-feed this into a numpy array? Alternatively, is there a reliable way to iterate over the values? I could just use %2 to select between the pairs of coordinates.

Jean-François

jfcaron · June 9, 2014, 8:31pm

The length of 177 turns out to be the length of the first dimension (declared as [] in the .C file). So I should have 177x2 entries if the conversion to PyROOT just “flattened” the array, but it seems like that’s not the case. Rather it seems that the second element is just ignored entirely.

Perhaps the solution is to add a function to Geom.C that takes an index 0-176 and returns an std::pair with the coordinates from Geom::FieldPos? Is there a more Pythony way?

Jean-François

wlav · June 12, 2014, 5:48am

Jean-François,

yes, one of them thingies … I don’t think that with Cling and p2.6 or later that there are any technical limitations. “Just” need to find the time to implement (or a volunteer ).

For now, to get a full flat, one-dimensional view, you can adjust the size of the PyDoubleBuffer with its SetSize() method.

HTH,
Wim

jfcaron · June 13, 2014, 5:28am

I’m game with trying to contribute some code that could expose native C++ 2d (or other dimensional) arrays as PyROOT numpy arrays (or a list of standard arrays) or something, but I don’t really know where to start. I know a lot of the bindings are auto-generated, so this would be writing python code to auto-generate C code to talk to the C++ arrays? Is there some comparable code that I could see (e.g. for TString to python string or something?)

Jean-François

wlav · June 13, 2014, 10:22pm

Jean-François,

thanks …

There is no code-generation as such (there is some in Cling, where wrappers are generated, but that’s part of ROOT/meta, not PyROOT). Rather, PyROOT creates the python classes/functions/etc. at run-time, using the python C-API. There is no fundamental difference between any of these approaches, but there are many practical ones, and the PyROOT one scales well.

There are two end-points that pre-exist (the location of the relevant TDataMember/TGlobal and python’s new buffer interface). What is needed is to connect the two.

First, on one end, a global variable living in a namespace, as you have, is made available by ROOT as a datamember of that namespace. In PyROOT, this data member is then represented as a “PropertyProxy”, see: $ROOTSYS/bindings/pyroot/src/PropertyProxy.h and PropertyProxy.cxx.

This class has two Set() methods, one for TDataMember’s (which is what you need) and one for TGlobal’s (which would be the case if FieldPos lived in the global namespace).

I’d argue to start with PyROOT::PropertyProxy::Set( TDataMember* dm ) in PropertyProxy.cxx, as that would solve your immediate issue. Afterwards, the code for TGlobal will be pretty much the same.

In that Set() function, you can see that figuring out the array dimensions is pretty straightforward. At the moment, only the first dimension is used and passed to the CreateConverter() call. What is needed is a utility struct that contains the dimensions and that gets passed by pointer (can be to a struct on the stack, as the converter can copy the values later) through the second parameter of that call. For convenience (i.e. not to have to deal with other code breaking), I’d start prototyping with a CreateArrayConverter() call, or by writing an overload for CreateConverter() that takes the struct with dimensions by const ref.

Next then is CreateConverter() in $ROOTSYS/bindings/pyroot/src/Converters.cxx. Based on the class name (double*), it will create a DoubleArrayConverter. Look for the PYROOT_IMPLEMENT_ARRAY_CONVERTER macro definition. Currently, the size goes to the contructor and is simply stored as a data member (fSize, see Converters.h look for PYROOT_DECLARE_ARRAY_CONVERTER). This needs to change, so that all the needed info is copied over from the struct, passed in from Set() before. For convenience, you can also create a new set of classes that take the utility struct as a data member, but that may be more work.

Now the relevant call is the FromMemory() call defined in PYROOT_IMPLEMENT_ARRAY_CONVERTER, which uses BufFac_t::Instance()->PyBuffer_FromMemory, which you find in TPyBufferFactory.h/.cxx. This function receives the address FieldPos, and currently it receives fSize. It should be modified to receive more details about the dimensions.

TPyBufferFactory is a horror that was hacked together for python 2.0, which really only allowed char buffers, or copies of buffers. The NumPy folks have since pushed for better support, and that finally made it into Python as PyMemoryView objects (as of p2.7, not as p2.6 as I was thinking) that provide access to new-style Py_Buffer objects (which carry a ‘shape’ a la numpy). Since this is what NumPy is using, if PyROOT uses the same, the interoperation should be really sweet.

So that is the other end that needs to tie up, and the doc is here: https://docs.python.org/2/c-api/buffer.html as well as this PEP: http://legacy.python.org/dev/peps/pep-3118/.

So, fill in the buffer struct with the pointer, shape, type, and all other required info; wrap it into a memory view; return that. Should do the ticket … To be sure, since that change is so localized, I’d do that in FromMemory() rather than bothering with modifying TPyBufferFactory.

Of course, after that, there’s a bunch of other things that will break (Utility::GetBuffer() for one, when passing the memory view through a function), but if you are bit careful (e.g. keep fSize in place as a one-dimensional number filled in the same way), these can be address piecemeal and as-needed.

Also, since p2.5 and p2.6 are still in use, it’s best to #ifdef new code that uses Py_Buffer on PY_VERSION_HEX being 0x02070000 or greater.

Thanks,
Wim