I just found out that the workaround I’ve been using for some time (until recently, PyROOT was not able to add the contents of multidimensional numpy arrays to a multidimensional std vector) is leaking memory.
It seems that something is not dereferenced after adding a list of array.array to a vector. While creating the test code below, I also discovered that the same problem applies to directly adding numpy arrays. There is no problem when adding pure 2D python lists. Also, in the case of big array in the example, array.array and numpy array cases are much slower than the pure list.
import numpy as np
import ROOT
import array
v = ROOT.vector("vector<double>")()
for i in range(1000):
print(i)
a = np.arange(200000*2, dtype=np.float64).reshape(200000,2)
a = [list(array.array('d', el)) for el in a]
#a = [array.array('d', el) for el in a]
v.clear()
v+=a
To test the code above, please compare running as is, with a = [list(array.array('d', el)) for el in a] commented or the previous commented and a = [array.array('d', el) for el in a] uncommented. The two latter versions should leak memory and be much slower than the former.
The leaking version after 9 iterations consumes here 2.2 GB, while non-leaking <500 KB.
Thanks for reaching out! Before going forward with debugging your issue, let me try to understand your problem a bit better.
until recently, PyROOT was not able to add the contents of multidimensional numpy arrays to a multidimensional std vector
To what feature are you referring to here exactly? Could you provide a snippet of code to show me what you mean?
It seems that something is not dereferenced after adding a list of array.array to a vector
I do not understand this statement, can you elaborate? What should be “dereferenced” in your opinion?
What is the use case you are trying to solve here? Why do you need to append a type array.array into a type ROOT.std.vector[ROOT.std.vector["double"]]? What would happen in your application after this step?
and there was a ticket for it. If you want I can try to find it. Basically, initialising or += of a numpy >=2D array to a vector crashed. Seems to be fixed now, at least for 2D numpy arrays. Haven’t tried for 3D yet.
I understand, that the garbage collector in Python cleans up object to which there are (nearly) no references. Here, clearly, there is a memory leak, so something is not cleaned by garbage collector, and I have a feeling that the contents of the array.array are copied inside the vector, but somehow the array.array object is being held after this copying, not derefrenced.
This is also described in the post I quote at the beginning. Basically, std vector was not accepting numpy arrays with types such as uint32. It was crashing. It was, however, accepting array.array types. Thus in this line I use array.array to convert numpy array to something acceptable by ROOT std vector, as a workaround. In this example I’ve given a double which I think worked OK, but in my code I deal with unsigned ints or unsigned shorts, which were crashing. Perhaps this conversion is working well now (I need to check), but it was not at the moment of writing the post that I refer to. Anyway, thanks to this workaround, I discovered the reported memory leak and slowdown.