Hallo,
to follow up and clearify the performance enhancing drugs, I did some test s myself with a simple tree.
In your discussion some things were a bit confusing, because Karol optimized the access to an single integer value and Wim was talking about an array.
In my example I have an array Jet_match2all_DeltaPt and an integer Jet_match2all_N which contains the array’s length.
Here is the code for my test (which does actually loop over all Jet_match2all_DeltaPt values and does nothing with the values):
#!/usr/bin/python
from ROOT import *
import time
from array import *
def method1():
f=TFile("5200.root_100.root")
t=f.Get("commonTree")
t.LoadTree(0)
start=time.time()
for entry in xrange(1000):
t.GetEntry(entry)
nr=t.Jet_match2all_N
for i in xrange(nr):
v=t.Jet_match2all_DeltaPt[i]
print "method 1",time.time()-start
del(t)
f.Close()
del(f)
def method2():
f=TFile("5200.root_100.root")
t=f.Get("commonTree")
Jet_match2all_DeltaPt=t.Jet_match2all_DeltaPt
t.LoadTree(0)
start=time.time()
for entry in xrange(1000):
t.GetEntry(entry)
nr=t.Jet_match2all_N
for i in xrange(nr):
v=Jet_match2all_DeltaPt[i]
print "method 2",time.time()-start
del(t)
f.Close()
del(f)
def method3():
f=TFile("5200.root_100.root")
t=f.Get("commonTree")
Jet_match2all_N=array("i",[0])
t.SetBranchAddress("Jet_match2all_N",Jet_match2all_N)
t.LoadTree(0)
start=time.time()
for entry in xrange(1000):
t.GetEntry(entry)
nr=Jet_match2all_N[0]
for i in xrange(nr):
v=t.Jet_match2all_DeltaPt[i]
print "method 3",time.time()-start
del(t)
f.Close()
del(f)
def method4():
f=TFile("5200.root_100.root")
t=f.Get("commonTree")
Jet_match2all_N=array("i",[0])
t.SetBranchAddress("Jet_match2all_N",Jet_match2all_N)
Jet_match2all_DeltaPt=array("f",[0.0]*10)
t.SetBranchAddress("Jet_match2all_DeltaPt",Jet_match2all_DeltaPt)
t.LoadTree(0)
start=time.time()
for entry in xrange(1000):
t.GetEntry(entry)
nr=Jet_match2all_N[0]
for i in xrange(nr):
v=Jet_match2all_DeltaPt[i]
print "method 4",time.time()-start
del(t)
f.Close()
del(f)
method4()
method3()
method2()
method1()
method1 uses the method described in the pyRoot tutorial, which is using the variables in the tree directly.
method2 caches the array t.Jet_match2all_DeltaPt (in reality is seems to be an “read-write buffer ptr”) in Jet_match2all_DeltaPt, before the loop and which can be used as an normal array. The integer variable is not optimized, because you cannot “cache” the value before the loop (it will be set to zero and stays the constant value 0). This is what Wim described (but Karols variable seems to be an integer, so with Karols example that does not work).
method3 is what Karol described, you set the branch address to an array.array (to make clear that the array from the python module array is meant). For a single integer you have to set the address to an integer array.array of length 1. Since Jet_match2all_N is now an array.array, you have to access it by Jet_match2all_N[0] (which is a bit of overhead for accessing a single integer). The array Jet_match2all_DeltaPt is accessed as in method1.
method4 sets the branch address for the integer and the array. Here Jet_match2all_DeltaPt is an array.array of a certain length, that you must set before the loop. I.e. you must know what the maximum number of array elements can be (here I set it to 10 which seems to be safe for me).
The execution time (measured only over the loop) on my computer is:
method 4 0.63s
method 3 1.35s
method 2 0.82s
method 1 1.32s
Here are the same numbers for a psyco.profile() optimization and running each method twice:
method 4 2.02s
method 3 1.13s
method 2 0.75s
method 1 1.21s
method 4 0.54s
method 3 1.14s
method 2 0.74s
method 1 1.22s
Here are the same numbers for a psyco.full() optimization and running each method twice:
method 4 3.31
method 3 1.79
method 2 0.75
method 1 1.14
method 4 0.56
method 3 1.07
method 2 0.75
method 1 1.14
The statistics is not so good, but the numbers show that method4 is the fastest and that method1 cannot achieve the same perfomance, even if is it optimized with psyco. method2 which avoids the unnatural access to an integer with “[0]” is also having an acceptable performance improve.
I hope I could help those that are unsure about the best to optimize their loop reading code.
Cheers,
Duc