STL vector and numpy types in PyROOT

Hello,

It seems ROOT does not recognise at least some numpy types when they are being pushed to STL vector:

import ROOT
import numpy as np

a = ROOT.vector("unsigned int")()

b = np.array([1], dtype=np.uint32)

a.push_back(b)

The problem is, that in general python, as far as I know, there is no type for unsigned int. So if we want to be precise with types, for example, when writing to TTrees, we need to use some more specific variables, for example those of numpy type. These work fine with TTrees, but do not seem to work with vector, as shown above.

Is there any workaround? At the moment I can convert values to standard python types when I detect them in my code, but that’s quite inefficient…

Hi @LeWhoo,

You can use python array module to specify C++ types for STL vector.
Here is an example:

import ROOT
import array

a = ROOT.vector("unsigned int")()
b = array.array('I', [42])
a.push_back(b[0])

Here is documentation of all the types:

Thanks, indeed that’s the way :slight_smile:

However, perhaps handling of numpy directly could be added to vector? It exists in other places of PyROOT and numpy arrays are far more popular than array.array. If it is just matter of some translation matrix, I could try to add it myself, if you point me to the file in the ROOT sourcode.

I also did some benchmarking. Assuming that I have a numpy array of np.uint16, what is the fastest way to add it to a vector now?

setup_string = 'import array; import numpy as np; import ROOT; v=ROOT.vector("unsigned short")(); na = np.array([1,2,3], dtype=np.uint16);'

# Direct conversion of numpy.array to array.array
timeit.timeit(stmt='a = array.array("H", na); v.clear(); v+=a', setup=setup_string, number=10000000)
27.444113552999625

# Numpy.tolist() -> array.array
timeit.timeit(stmt='a = array.array("H", na.tolist()); v.clear(); v+=a', setup=setup_string, number=10000000)
18.957313061000605

# Elementwise int() on numpy.tolist()
timeit.timeit(stmt='a = [int(el) for el in na.tolist()]; v.clear(); v+=a', setup=setup_string, number=10000000)
18.77936904699891

# Change of numpy dtype, then tolist()
timeit.timeit(stmt='a = na.astype(int).tolist(); v.clear(); v+=a', globals=globals(), number=10000000)
20.10536060999948

# With numpy.tobytes()
timeit.timeit(stmt='a = array.array("H", na.tobytes()); v.clear(); v+=a', setup=setup_string, number=10000000)
18.04331880800237

So initialising array.array from ndarray.tobytes() seems to be the fastest, but from ndarray.tolist() is not much slower. And surprisingly a list comprehension where each element is converted to int separately is as fast. Maybe it would be different on larger arrays. Direct conversion of numpy array to array.array is much less efficient. Also, I am surprised that numpy.astype().tolist() is slower than most of the others.

Interesting!
Although, I am not a developer, just a regular user, so I am not so fluent in the source code.

Maybe experts can comment here

cheers

So I’ve actually just discovered another limitation of array.array(). It can only be 1D. So for multidimensional vectors one either has a strange list of array.arrays with specific type or a list of lists or numpy array without the ability to specify unsignedness, etc. Another argument to enable use of numpy arrays :slight_smile:

So, it’s something for the ROOT team … @Axel @vpadulan

Hi,

You can directly convert a numpy array to a std.vector if it is of the same type"

import ROOT
import numpy as np


b = np.array([1], dtype=np.uint32)

a =  ROOT.std.vector("unsigned int")(b)

Lorenzo

Thanks, this works, but push_back or += do not. And += is, from what I understand, the main method to assign an array to an empty vector. This is useful when filling a TTree with a vector, when one creates a vector, then a branch pointing to that vector, then reuse this vector with .clear() and += for every fill() of the TTree.

Arguably, push_back (and so also +=) is rather inefficient as it run in a Python loop. To be sure, initializing std::vector from a numpy array isn’t terribly better unless that array is 1D, but if you start out from an empty vector, it’s better to use swap(). You can copy the one you’re taking the contents from if need be. Examples:

import cppyy
import numpy as np

vec = cppyy.gbl.std.vector['unsigned int']
val = 2**32-1

a = np.array([val], dtype=np.uint32)
assert a[0] == val

v1 = vec(a)
assert v1[0] == val

v2 = vec()
v2.swap(vec(a))
assert v2[0] == val

v3 = vec()
v3.swap(vec(v1))

assert v1[0] == val
assert v3[0] == val

I am getting a little bit lost with my own replies. Swap() may indeed be the faster, but either I am doing something wrong, or vector can’t be initialised during creation from a numpy array that has more than 1 dimension:

import ROOT
import numpy as np

b = np.arange(2*2, dtype=np.uint32).reshape(2,2)
a =  ROOT.std.vector("vector<unsigned int>")(b)

The above crashes. It works if in vector init we replace b with list(b), and despite list(b) being rather slow, we could use it with vector::swap(). However, the above crashes for a 3D list event with the conversion to list:

import ROOT
import numpy as np

b = np.arange(2*2*2, dtype=np.uint32).reshape(2,2,2)
a =  ROOT.std.vector("vector<vector<unsigned int>>")(list(b))

It can, as long as the memory is flat, otherwise the operation is meaningless.

Maybe it does in ROOT, I don’t know, but it’s fine in cppyy.

I thought that these are equivalent. However, it crashed here also in this case:

In [1]: import cppyy
   ...: import numpy as np
   ...: 
   ...: b = np.arange(2*2, dtype=np.uint32).reshape(2,2)
   ...: a = cppyy.gbl.std.vector["vector<unsigned int>"](b)
 *** Break *** segmentation violation

I guess I have cppyy from my ROOT, 6.28.00

That’s your problem, then. :slight_smile: Try cppyy · PyPI

Thanks, however, my project is ROOT based, and adding another dependency to it would be difficult. I hope we can find out why it crashes in ROOT.

We will see what’s causing the crash - if you don’t hear back from us next week then please ping us!

Thanks!

So I am pinging you :slight_smile:

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Sorry, found the ping back. I will check tomorrow (if releasing 6.28/04 allows) Next week I’ll be at a conference - I should have enough time to look at it :slight_smile: