Hi everybody,
I am experiencing the problem that when “walking” through objects in a ROOT-file, retrieving the objects takes more and more time the longer you walk.
Consider the following code where I walk through a ROOT-file with many TEfficiency objects inside. For convenience, I use the rootpy.io.file.walk function for walking through the ROOT-file. However, as I elaborate below, it is not the cause of the problem. You can find the script and the corresponding test ROOT-file scurves.root
here:
#!/usr/bin/env python
from rootpy.io import root_open
import rootpy.ROOT as ROOT
import time
_print_interval = 1000
_last_time = time.time()
_obj_cnt = 0
def print_time_past():
global _print_interval, _last_time, _obj_cnt
_obj_cnt += 1
if _obj_cnt%_print_interval == 0:
ms_past = (time.time()-_last_time)*1000.
print(f"Took {ms_past:.0f}ms for the last {_print_interval} objects.")
_last_time = time.time()
def walk():
fname = "scurves.root"
with root_open(fname, "READ") as root_file:
for path, dirs, objects in root_file.walk(""):
for obj_name in objects:
obj = root_file[path+"/"+obj_name] #This line is critical!
#Do something with object
print_time_past()
walk()
You will notice that, when running it, it will take more and more time to walk through batches of 1000 objects, producing an output like this:
Took 3816ms for the last 1000 objects.
Took 2334ms for the last 1000 objects.
Took 2174ms for the last 1000 objects.
Took 2253ms for the last 1000 objects.
Took 2355ms for the last 1000 objects.
Took 2582ms for the last 1000 objects.
Took 2583ms for the last 1000 objects.
Took 2652ms for the last 1000 objects.
Took 2750ms for the last 1000 objects.
Took 2909ms for the last 1000 objects.
Took 2778ms for the last 1000 objects.
Took 2902ms for the last 1000 objects.
Took 3109ms for the last 1000 objects.
Took 3269ms for the last 1000 objects.
Took 3412ms for the last 1000 objects.
Took 3410ms for the last 1000 objects.
Took 3880ms for the last 1000 objects.
Took 4391ms for the last 1000 objects.
Took 4919ms for the last 1000 objects.
Took 4758ms for the last 1000 objects.
Took 4143ms for the last 1000 objects.
Took 4400ms for the last 1000 objects.
Took 4303ms for the last 1000 objects.
Took 4446ms for the last 1000 objects.
Took 4438ms for the last 1000 objects.
Took 5050ms for the last 1000 objects.
Took 4827ms for the last 1000 objects.
Took 5275ms for the last 1000 objects.
Took 6279ms for the last 1000 objects.
Took 5203ms for the last 1000 objects.
Took 5862ms for the last 1000 objects.
Took 5982ms for the last 1000 objects.
Took 6300ms for the last 1000 objects.
Took 5883ms for the last 1000 objects.
Took 6926ms for the last 1000 objects.
Took 8219ms for the last 1000 objects.
Took 7514ms for the last 1000 objects.
Took 6585ms for the last 1000 objects.
Took 6371ms for the last 1000 objects.
Took 6994ms for the last 1000 objects.
Took 8177ms for the last 1000 objects.
Took 8137ms for the last 1000 objects.
Took 7065ms for the last 1000 objects.
Took 7374ms for the last 1000 objects.
Took 7479ms for the last 1000 objects.
Took 7564ms for the last 1000 objects.
Took 8076ms for the last 1000 objects.
Took 9441ms for the last 1000 objects.
Took 8773ms for the last 1000 objects.
Took 10552ms for the last 1000 objects.
Took 9495ms for the last 1000 objects.
Took 10253ms for the last 1000 objects.
Took 13477ms for the last 1000 objects.
Took 16660ms for the last 1000 objects.
Took 18195ms for the last 1000 objects.
I believe it is somehow related to memory usage when actually retrieving the object from the ROOT-file, since when commenting out the line
obj = root_file[path+"/"+obj_name] #This line is critical!
looping through batches of 1000 objects takes equally long no matter how far you “walk” (which also means that the rootpy.io.file.walk function itself is innocent)
Tested with:
Python 3.6.5
rootpy 1.0.1
ROOT 6.14/04
Looking very much forward for ideas and suggestions, thanks already!