Huge memory usage in pyROOT

RobS · September 11, 2017, 9:42pm

My pyROOT analysis code is using huge amounts of memory. I have reduced the problem to the example code below:

from ROOT import TChain, TH1D

# Load file, chain
chain = TChain("someChain")
inFile = "someFile.root"
chain.Add(inFile)

nentries = chain.GetEntries()

# Declare histograms
h_nTracks = TH1D("h_nTracks", "h_nTracks", 16, -0.5, 15.5)
h_E = TH1D("h_E","h_E",100,-0.1,6.0)
h_p = TH1D("h_p", "h_p", 100, -0.1, 6.0)
h_ECLEnergy = TH1D("h_ECLEnergy","h_ECLEnergy",100,-0.1,14.0)

# Loop over entries
for jentry in range(nentries):
   # Load entry
   entry = chain.GetEntry(jentry)

   # Define variables
   cands = chain.__ncandidates__
   nTracks = chain.nTracks
   E = chain.useCMSFrame__boE__bc
   p = chain.useCMSFrame__bop__bc
   ECLEnergy = chain.useCMSFrame__boECLEnergy__bc

   # Fill histos
   h_nTracks.Fill(nTracks)
   h_ECLEnergy.Fill(ECLEnergy)

   for cand in range(cands):
      h_E.Fill(E[cand])
      h_p.Fill(p[cand])

where someFile.root is a root file with 700,000 entries and multiple particle candidates per entry.

After some investigation I have identified two problems:

Problem 1

When I run this script it uses ~600 MB of memory. If I remove the line

h_p.Fill(p[cand])

it uses ~400 MB.

If I also remove the line

h_E.Fill(E[cand])

it uses ~150 MB.

If I also remove the lines

h_nTracks.Fill(nTracks)
h_ECLEnergy.Fill(ECLEnergy)

there is no further reduction in memory usage.

It seems that for every extra histogram that I fill of the form

h_variable.Fill(variable[cand])

(i.e. histograms that are filled once per candidate per entry, as opposed to histograms that are just filled once per entry) I use an extra ~200 MB of memory. This becomes a serious problem when I have 10 or more histograms because I am using GBs of memory and I am exceeding the limits of my computing system. Does anybody have a solution?

Problem 2

You will see in the above script that I renamed my variables for convenience because the original variable names in someFile.root are a bit long, e.g.:

ECLEnergy = chain.useCMSFrame__boECLEnergy__bc

During my investigations I discovered that if I replace

   # Fill histos
   h_nTracks.Fill(nTracks)
   h_ECLEnergy.Fill(ECLEnergy)

   for cand in range(cands):
      h_E.Fill(E[cand])
      h_p.Fill(p[cand])

with

   # Fill histos
   h_nTracks.Fill(chain.nTracks)
   h_ECLEnergy.Fill(chain.useCMSFrame__boE__bc)

   for cand in range(chain.__ncandidates__):
      h_E.Fill(chain.useCMSFrame__boE__bc[cand])
      h_p.Fill(useCMSFrame__bop__bc[cand])

(i.e. if I use the original variable names, rather than the simpler ones I created) then the script uses ~1.3 GB of memory! I suppose I have already inadvertently “solved” this problem by using the simpler names, but it is very strange behaviour. Does anybody understand why this happens?

Any help, especially with problem 1, would be much appreciated

Thanks,
Rob

RobS · September 12, 2017, 8:05pm

Update: I think this is a python3 problem.

If I take the script in my original post (above) and run it using python2 the memory usage is ~200 MB, compared to ~600 MB with python3. Even if I try to replicate Problem 2 by using the long variable names, the job still only uses ~200 MB of memory with python2, compared to ~1.3 GB with python3.

During my Googling I came across a few other accounts of people encountering memory leaks when using pyROOT with python3. It seems this is still an issue as of Python 3.6.2 and ROOT 6.08/06, and that for the moment you must use python2 if you want to use pyROOT.

So, using python2 appears to be my “solution” for now, but it’s not ideal. If anybody has any further information or suggestions I’d be grateful to hear from you!

Axel · September 13, 2017, 6:24am

Pinging our Python experts, past and present: @mato @Danilo @etejedor @wlav - do you have some hints where to look at? Is this “simply” a garbage collection issue?

daritter · September 13, 2017, 11:26am

Hi Rob,

Can you try using traditional SetBranchAddress() style like in C++? e.g. something like

import numpy as np
E = np.zeros(MAXCANDIDATES, dtype=np.float) # or np.double, depends
chain.SetBranchAddress("useCMSFrame__boECLEnergy__bc", E)

and so forth? It could be that the chain.branchname has some problems with its reference counting in python3 so not referencing the branches as attributes might help

Cheers,

wlav · September 13, 2017, 4:48pm

@axel: it can’t be a ref-counting problem in the user code as the for-loop body overwrites the references. Of course, there can be a ref-counting problem somewhere else (i.e. leaks), but the TTree branch lookup through getattr is the same for both versions in PyROOT, so that would not be suspect to my mind.

Anyway, for the rest, it is clear that this indexing:

E[cand]

(and similarly for ‘p’) is the trouble maker and it’s not clear to me at all what it does, as the code for “useCMSFrame_bo[Ep]__bc” is not given.

RobS · September 13, 2017, 5:43pm

Hi @Axel,

I also thought it might be a garbage collection issue. I tried

import gc

and then

gc.collect()

at the end of each loop over jentry. This had no effect on the memory usage and it significantly increased the runtime of the script. If anybody has a suggestion for a more sophisticated garbage collection fix I’m more than willing to give it a go.

RobS · September 13, 2017, 6:13pm

Hi @daritter,

Maybe I’m not doing this right, but I just get an error:

import numpy as np
E = np.zeros(chain.__ncandidates__, dtype=np.float)
chain.SetBranchAddress("useCMSFrame__boE__bc", E)

gives me

Traceback (most recent call last):
  File "pySelector.py", line 31, in <module>
    chain.SetBranchAddress("useCMSFrame__boE__bc", E)
TypeError: none of the 3 overloaded methods succeeded. Full details:
  int TChain::SetBranchAddress(const char* bname, void* add, TBranch** ptr, TClass* realClass, EDataType datatype, bool isptr) =>
    takes at least 6 arguments (2 given)
  int TChain::SetBranchAddress(const char* bname, void* add, TClass* realClass, EDataType datatype, bool isptr) =>
    takes at least 5 arguments (2 given)
  int TChain::SetBranchAddress(const char* bname, void* add, TBranch** ptr = 0) =>
    could not convert argument 2
*** glibc detected *** python3: munmap_chunk(): invalid pointer: 0x000000000479df00 ***

RobS · September 13, 2017, 6:22pm

Hi @wlav,

useCMSFrame_boE__bc is just a float indexed by __ncandidates__:

root [4] someTree->GetListOfLeaves()->Print()
Collection name='TObjArray', class='TObjArray', size=21
OBJ: TLeafI	__ncandidates__	__ncandidates__
OBJ: TLeafF	useCMSFrame__boE__bc	useCMSFrame__boE__bc[__ncandidates__]

(and similarly for ‘p’). Physically, it’s just the energy of a particle in the CM frame, the weird name is just a by-product of the software used in my experiment to produce root files. So for each entry in the root file, there are several particles (__ncandidates__), and each particle has a value for E (energy), p (momentum), etc.

So if I want to plot the energy of each particle in a given entry I have to loop over __ncandidates__ (or cand as I renamed it) and fill my histogram with E[cand].

wlav · September 13, 2017, 8:24pm

Okay, that could point to a difference between p2 and p3.

If you do not store a reference to the array outside the internal loop, you get a lookup for every indexing which includes the creation of a buffer object (and a temporary converter to create that in the first place). That’ll be horribly slow, regardless of memory use trouble.

p2 and p3 also handle buffers completely differently. Older p3’s had a leak, but that’s long gone in p3.6 (the problem was that the view held a pointer to the buffer, which thus must be given the same live span, but the view did not control that). That said, I don’t think anyone looked at PyROOT + p3.6, so something may have subtly changed.

Graipher · September 14, 2017, 7:11am

Try telling it explicitly the structure of the array (that it is an array, what its length is and what type of values it contains), by using the third parameter of SetBranchAddress:

import numpy as np
E = np.zeros(chain.__ncandidates__, dtype=np.float)
chain.SetBranchAddress("useCMSFrame__boE__bc", E, "useCMSFrame__boE__bc[__ncandidates__]/F")

You have to be careful with this, if the length of E needs to be different for each event. In this case you need to hardcode some size which is large enough for all events.

RobS · September 14, 2017, 2:50pm

Hi @Graipher,

Trying

chain.SetBranchAddress("useCMSFrame__boE__bc", E, "useCMSFrame__boE__bc[chain.__ncandidates__]/F")

gives me

Traceback (most recent call last):
  File "pySelector.py", line 32, in <module>
    chain.SetBranchAddress("useCMSFrame__boE__bc", E, "useCMSFrame__boE__bc[chain.__ncandidates__]/F")
TypeError: none of the 3 overloaded methods succeeded. Full details:
  int TChain::SetBranchAddress(const char* bname, void* add, TBranch** ptr, TClass* realClass, EDataType datatype, bool isptr) =>
    takes at least 6 arguments (3 given)
  int TChain::SetBranchAddress(const char* bname, void* add, TClass* realClass, EDataType datatype, bool isptr) =>
    takes at least 5 arguments (3 given)
  int TChain::SetBranchAddress(const char* bname, void* add, TBranch** ptr = 0) =>
    could not convert argument 3

daritter · September 14, 2017, 3:06pm

@RobS taking the code above (and using the example for creating candidate based ntuples we have) I came up with the following:

import numpy as np
from ROOT import TChain, TH1D

# Load file, chain
chain = TChain("tree")
chain.Add("analysis/examples/VariableManager/VariablesToEventBasedTree.root")

nentries = chain.GetEntries()
# find out how many candidates we will have as a maximum. For large chains this
# might be slow then one can set max_candidates to any large enough number
max_candidates = int(chain.GetMaximum("__ncandidates__"))

h_ncand = TH1D("h_ncand", "", 100, 0, 100)
h_E = TH1D("h_E", "h_E", 100, -0.1, 6.0)

ncandidates = np.zeros(1, dtype=np.int32)
E = np.zeros(max_candidates, dtype=np.float)
chain.SetBranchAddress("__ncandidates__", ncandidates)
chain.SetBranchAddress("E", E)

for entry in range(nentries):
    chain.GetEntry(entry)
    # ncandidates is an array of length 1 so we need to index it :/
    h_ncand.Fill(ncandidates[0])
    # and when looping over E we have to make sure to only look at the first
    # ncandidates[0] elements
    for e in E[:ncandidates[0]]:
        h_E.Fill(e)
    # the above is equivalent to
    for icand in range(ncandidates[0]):
        h_E.Fill(E[icand])

This works fine for me on my small file without significant memory consumption. Could you try if this works for you?

RobS · September 19, 2017, 3:20pm

Hi @daritter,

I tried your solution. It runs, and suffers no memory leaks, but there appears to be something wrong with either

E = np.zeros(max_candidates, dtype=np.float)
chain.SetBranchAddress("E", E)

or

for icand in range(ncandidates[0]):
        h_E.Fill(E[icand])

Compare:

Old method:

from ROOT import TChain, TH1D
import numpy as np

chain = TChain("someChain")
inFile = "someFile.root"
chain.Add(inFile)

nentries = chain.GetEntries()

for jentry in range(nentries):
   chain.GetEntry(jentry)

   E_old = chain.useCMSFrame__boE__bc

   if jentry < 10:
      print('\n\nentry:', jentry)
      print('cands: ' + str(chain.__ncandidates__))
      for icand in range(chain.__ncandidates__):
         print('E_old[' + str(icand) + ']: ' + str(E_old[icand]))

Output:

entry: 0
cands: 2
E_old[0]: 1.0149850845336914
E_old[1]: 1.688098669052124


entry: 1
cands: 4
E_old[0]: 0.538535475730896
E_old[1]: 0.8976801633834839
E_old[2]: 1.799019694328308
E_old[3]: 2.196112871170044

# etc...

New method:

from ROOT import TChain, TH1D
import numpy as np

chain = TChain("someChain")
inFile = "someFile.root"
chain.Add(inFile)

nentries = chain.GetEntries()
max_candidates = int(chain.GetMaximum("__ncandidates__"))

ncandidates = np.zeros(1, dtype=np.int32)
chain.SetBranchAddress("__ncandidates__", ncandidates)

E_new = np.zeros(max_candidates, dtype=np.float)
chain.SetBranchAddress("useCMSFrame__boE__bc", E_new)

for jentry in range(nentries):
   chain.GetEntry(jentry)

   if jentry < 10:
      print('\n\nentry:', jentry)
      print('cands: ' + str(ncandidates[0]))
      for icand in range(ncandidates[0]):
         print('E_new[' + str(icand) + ']: ' + str(E_new[icand]))

Output:

entry: 0
cands: 2
E_new[0]: 0.37619739725
E_new[1]: 0.0


entry: 1
cands: 4
E_new[0]: 0.0026618805644
E_new[1]: 3.56890344545
E_new[2]: 0.0
E_new[3]: 0.0

# etc...

The output from E_new is completely different to E_old, and in many cases returns 0.0 for some candidates. cands matches for every entry, so it looks like

ncandidates = np.zeros(1, dtype=np.int32)
chain.SetBranchAddress("__ncandidates__", ncandidates)

works correctly. I tried fixing the E_old vs E_new issue but with no luck, I have no experience with numpy so I don’t know if there is some issue there. Do you have any idea what is happening?

daritter · September 21, 2017, 7:59am

Hi @RobS,

I’m sorry, my mistake. You need

E = np.zeros(max_candidates, dtype=np.float32)
chain.SetBranchAddress("E", E)

The float32 is important as the branch are C floats (32bit wide) and not double (64bit wide) which is the default when using np.float. I should have checked more closely.

RobS · September 22, 2017, 6:38pm

Hi @daritter,

This solves the problem, thanks very much for all your help with this!

daritter · September 26, 2017, 2:06pm

As this seems to be a more general problem I created a JIRA issue for it: https://sft.its.cern.ch/jira/browse/ROOT-9025

system · October 10, 2017, 2:06pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.