pyROOT and std.vector - dramatic RAM usage

Hello everyone.

I’m looping a tree with pyROOT and I need to rewrite content to another tree with modifications. I want to have vectors of ints as branches and for that I’m using
ROOT.std.vector(ROOT.int)()
But this dramatically increase my ram memory use. Even If I just declare the vector and I never used it - memory usage is just getting bigger while I’m looping trough the tree but if I don’t declare that - it’s look fine. What could be the problem here?

I attached the files to the topic. There you have test.py is a main file, Hit.py - is a class I want to have and a file with tree. Inside test.py a line 17 which is giving troubles. I also added function getMem() to see memory usage.
cernbox.cern.ch/index.php/s/yPbW2j5EQsRkZE4

Hello,

Ok thanks for reporting, I will have a look.

Just a quick question before looking into it, you iterate on the tree but not on the vector itself? I’m asking because we recently fixed a memory leak when iterating on STL collections:

Hello,

Yes, I just declared a vector but never used it. I’m looping over a tree and it makes very big difference if it’s being declared or not.

Thanks, I was able to reproduce, I will investigate and get back to you!

Interestingly enough, it has to be a vector of ints (does not happen e.g. with vector of string or float).

I don’t know the cause yet, but I found a workaround (in case it can be useful for you).

The problem comes down to instantiating an std::vector before the “layer” branch is read (which is also of type std::vector).

If you make sure you read at least once the branch before you instantiate the vector, that should be enough. Were you creating a vector to read the branch into that vector? If so, from Python you don’t need to do that, just calling tree.branchname gives you the branch for that particular event in a loop like this:

for event in tree:
    # l should be a vector<int>
    l = event.layer

Thank you! For now I switched to ROOT.RDataFrame so now I don’t need to loop over the tree which was causing the increase over time.

Very good decision to switch to RDF in any case!

After some debugging, I opened this issue with an explanation of the issue and a reproducer in C++:

I suggest we follow it there!

Thank you for that, I will follow it there!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.