It seems ROOT does not recognise at least some numpy types when they are being pushed to STL vector:
import ROOT
import numpy as np
a = ROOT.vector("unsigned int")()
b = np.array([1], dtype=np.uint32)
a.push_back(b)
The problem is, that in general python, as far as I know, there is no type for unsigned int. So if we want to be precise with types, for example, when writing to TTrees, we need to use some more specific variables, for example those of numpy type. These work fine with TTrees, but do not seem to work with vector, as shown above.
Is there any workaround? At the moment I can convert values to standard python types when I detect them in my code, but that’s quite inefficient…
However, perhaps handling of numpy directly could be added to vector? It exists in other places of PyROOT and numpy arrays are far more popular than array.array. If it is just matter of some translation matrix, I could try to add it myself, if you point me to the file in the ROOT sourcode.
I also did some benchmarking. Assuming that I have a numpy array of np.uint16, what is the fastest way to add it to a vector now?
setup_string = 'import array; import numpy as np; import ROOT; v=ROOT.vector("unsigned short")(); na = np.array([1,2,3], dtype=np.uint16);'
# Direct conversion of numpy.array to array.array
timeit.timeit(stmt='a = array.array("H", na); v.clear(); v+=a', setup=setup_string, number=10000000)
27.444113552999625
# Numpy.tolist() -> array.array
timeit.timeit(stmt='a = array.array("H", na.tolist()); v.clear(); v+=a', setup=setup_string, number=10000000)
18.957313061000605
# Elementwise int() on numpy.tolist()
timeit.timeit(stmt='a = [int(el) for el in na.tolist()]; v.clear(); v+=a', setup=setup_string, number=10000000)
18.77936904699891
# Change of numpy dtype, then tolist()
timeit.timeit(stmt='a = na.astype(int).tolist(); v.clear(); v+=a', globals=globals(), number=10000000)
20.10536060999948
# With numpy.tobytes()
timeit.timeit(stmt='a = array.array("H", na.tobytes()); v.clear(); v+=a', setup=setup_string, number=10000000)
18.04331880800237
So initialising array.array from ndarray.tobytes() seems to be the fastest, but from ndarray.tolist() is not much slower. And surprisingly a list comprehension where each element is converted to int separately is as fast. Maybe it would be different on larger arrays. Direct conversion of numpy array to array.array is much less efficient. Also, I am surprised that numpy.astype().tolist() is slower than most of the others.
So I’ve actually just discovered another limitation of array.array(). It can only be 1D. So for multidimensional vectors one either has a strange list of array.arrays with specific type or a list of lists or numpy array without the ability to specify unsignedness, etc. Another argument to enable use of numpy arrays
Thanks, this works, but push_back or += do not. And += is, from what I understand, the main method to assign an array to an empty vector. This is useful when filling a TTree with a vector, when one creates a vector, then a branch pointing to that vector, then reuse this vector with .clear() and += for every fill() of the TTree.
Arguably, push_back (and so also +=) is rather inefficient as it run in a Python loop. To be sure, initializing std::vector from a numpy array isn’t terribly better unless that array is 1D, but if you start out from an empty vector, it’s better to use swap(). You can copy the one you’re taking the contents from if need be. Examples:
import cppyy
import numpy as np
vec = cppyy.gbl.std.vector['unsigned int']
val = 2**32-1
a = np.array([val], dtype=np.uint32)
assert a[0] == val
v1 = vec(a)
assert v1[0] == val
v2 = vec()
v2.swap(vec(a))
assert v2[0] == val
v3 = vec()
v3.swap(vec(v1))
assert v1[0] == val
assert v3[0] == val
I am getting a little bit lost with my own replies. Swap() may indeed be the faster, but either I am doing something wrong, or vector can’t be initialised during creation from a numpy array that has more than 1 dimension:
import ROOT
import numpy as np
b = np.arange(2*2, dtype=np.uint32).reshape(2,2)
a = ROOT.std.vector("vector<unsigned int>")(b)
The above crashes. It works if in vector init we replace b with list(b), and despite list(b) being rather slow, we could use it with vector::swap(). However, the above crashes for a 3D list event with the conversion to list:
import ROOT
import numpy as np
b = np.arange(2*2*2, dtype=np.uint32).reshape(2,2,2)
a = ROOT.std.vector("vector<vector<unsigned int>>")(list(b))