Memory leak and poor performance i PyROOT when adding arrays to vector

Dear ROOTers,

I just found out that the workaround I’ve been using for some time (until recently, PyROOT was not able to add the contents of multidimensional numpy arrays to a multidimensional std vector) is leaking memory.

It seems that something is not dereferenced after adding a list of array.array to a vector. While creating the test code below, I also discovered that the same problem applies to directly adding numpy arrays. There is no problem when adding pure 2D python lists. Also, in the case of big array in the example, array.array and numpy array cases are much slower than the pure list.

import numpy as np
import ROOT
import array

v = ROOT.vector("vector<double>")()

for i in range(1000):
	print(i)
	a = np.arange(200000*2, dtype=np.float64).reshape(200000,2)
	a = [list(array.array('d', el)) for el in a]
	#a = [array.array('d', el) for el in a]
	v.clear()
	v+=a

To test the code above, please compare running as is, with a = [list(array.array('d', el)) for el in a] commented or the previous commented and a = [array.array('d', el) for el in a] uncommented. The two latter versions should leak memory and be much slower than the former.

The leaking version after 9 iterations consumes here 2.2 GB, while non-leaking <500 KB.


ROOT Version: 6.30/04
Platform: Fedora 38


Dear @LeWhoo ,

Thanks for reaching out! Before going forward with debugging your issue, let me try to understand your problem a bit better.

until recently, PyROOT was not able to add the contents of multidimensional numpy arrays to a multidimensional std vector

To what feature are you referring to here exactly? Could you provide a snippet of code to show me what you mean?

It seems that something is not dereferenced after adding a list of array.array to a vector

I do not understand this statement, can you elaborate? What should be “dereferenced” in your opinion?

What is the use case you are trying to solve here? Why do you need to append a type array.array into a type ROOT.std.vector[ROOT.std.vector["double"]]? What would happen in your application after this step?

Cheers,
Vincenzo

It is described in this topic:

and there was a ticket for it. If you want I can try to find it. Basically, initialising or += of a numpy >=2D array to a vector crashed. Seems to be fixed now, at least for 2D numpy arrays. Haven’t tried for 3D yet.

I understand, that the garbage collector in Python cleans up object to which there are (nearly) no references. Here, clearly, there is a memory leak, so something is not cleaned by garbage collector, and I have a feeling that the contents of the array.array are copied inside the vector, but somehow the array.array object is being held after this copying, not derefrenced.

This is also described in the post I quote at the beginning. Basically, std vector was not accepting numpy arrays with types such as uint32. It was crashing. It was, however, accepting array.array types. Thus in this line I use array.array to convert numpy array to something acceptable by ROOT std vector, as a workaround. In this example I’ve given a double which I think worked OK, but in my code I deal with unsigned ints or unsigned shorts, which were crashing. Perhaps this conversion is working well now (I need to check), but it was not at the moment of writing the post that I refer to. Anyway, thanks to this workaround, I discovered the reported memory leak and slowdown.

I confirm the bug also exists in ROOT 6.36.06.

It is a critical bug for my collaboration. It basically prevents filling of big TTrees with ND vectors from PyROOT:

  1. Memory leak kills the system when using numpy array or array.array
  2. list of lists does not work for types such as unsigned short

No PyROOT workaround comes to my mind. I’ll try to write some kind of a vector filler in C++ to be called from PyROOT. I hope it is possible.

Should I report the problem as a ROOT bug?