PyROOT loop over files, huge memory leak

I have a script looping over ROOT files. Each file contains a TFolder which contains TFolders with histograms. I want to grab a few numbers from each histogram and put in in a database.

#called by the loop in the main function:

def process_file(f):
#open root file
rootFile = TFile(path + “/” + f,“READ”)
print "Processing file " + f + " … "

for key in rootFile.GetListOfKeys():
print key.GetClassName()

#global directory
directory = rootFile.FindObjectAny(“histos”)

rootFile.Close()
del directory

After each pass over a file, my usage memory increases by a few 100 MB. So I can`t loop over the few 1000 files I have.

I also tried directory.Clear() and gc.collect(). It’s frustrating that a simple loop over ROOT files blows up the memory

Hi @Frederik_Wauters,

Could I have access to at least one of these files to try to reproduce?

The code you specified is the minimal reproducer to see the issue?

Thank you,

Enric

Hi Enric,

this is a working minimal reporducer:

#!/usr/bin/python

#Imports
from ROOT import TFile, TFolder, TH1I

def process_file(f):
rootFile = TFile(f,“READ”)
histos_folder = rootFile.Get(“histos”)

#process histograms
#h = histos_folder.FindObjectAny(“hTriggerStats”)

#histos_folder.Clear() #doesnt help
#del histos_folder #doesnt help
rootFile.Close()

def main():
files = [‘his05000_Reanalysis_1.root’,‘his05001_Reanalysis_1.root’,‘his05002_Reanalysis_1.root’]0
for f in files:
process_file(f)

if name == “main”:
main()

This is a link to 3 input files:

[https://drive.google.com/open?id=1CUReKcz4v1h1YiwhTLN71XoqqZGLsqV7]

best

Frederik

Hi,

I observe the same behaviour in an equivalent C++ macro:

void process_file(const char* fname) {
  TFile rootFile(fname, "READ");
  TFolder *histos_folder;
  rootFile.GetObject("histos", histos_folder);
  rootFile.Close();

  delete histos_folder;

  MemInfo_t memInfo;
  gSystem->GetMemInfo(&memInfo);
  cout << memInfo.fMemUsed << endl;
}

void test() {
  for ( auto &s : { "his05000_Reanalysis_1.root", "his05001_Reanalysis_1.root", "his05002_Reanalysis_1.root" }) {
    process_file(s);
  }
}

Deleting or not the histos folder does not make a difference.

This is now being followed in this JIRA ticket:

https://sft.its.cern.ch/jira/browse/ROOT-9275

mostly likely because:

root [0] 
Attaching file his05000_Reanalysis_1.root as _file0...
(TFile *) 0x5647bd577dc0
root [1] auto ff =(TFolder*)_file0->Get("histos")
(TFolder *) 0x5647beb83600
root [2] ff->IsOwner()
(bool) false

The subfolders also need the owner flag setting.

@Frederik_Wauters, can you add these lines to the end of your process_file function?

  histos_folder.SetOwner(True)
  for fo in histos_folder.GetListOfFolders():
    fo.SetOwner(True)
  histos_folder.Clear()

Cheers,
Enric

Yes this works! I hadn`t tried yet to set the ownership to all the subfolder. Thanks!

Somebody using the same DAQ generating ROOT files structured in this way, had the same issue, with the same solution [Reading root-file causes memory leak]

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.