Fastest way to fill histogram with for loop

Dusj · August 9, 2016, 4:22am

Hi there,

I’m trying to optimize my python code because it feels quite slow.

So in principal I am looping over a Root file with a TTree containing several branches (10GB).
If I draw a 1D histogram of “ADCEnergy” (array of size 2) it takes 4-5 seconds and the total number of entries is 3.5 million.

In my small python script I’m basically doing the same, except for some cuts, however it takes 77 seconds.

file = ROOT.TFile.Open('SomeFile.root', 'read')
Tree = file.Get('tree')
hist1D = ROOT.TH1F('Name', 'Time; ADC Energy [keV]; Events',200, 0, 10000)
canv= ROOT.TCanvas()

Tree.SetBranchStatus('*', 0)
Tree.SetBranchStatus('Par1', 1)
Tree.SetBranchStatus('Par2', 1)
Tree.SetBranchStatus('Par3', 1)
Tree.SetBranchStatus('ADCEnergy', 1)

for event in Tree:
    Par1 = event.Par1
    if Par1 <= 0.1 or Par1 >= 0.2:
        continue
        
    Par2 = event.Par2
    if Par2[0] == 0 and Par2[2] == 0 and Par2[3] == 0:
    
        Par3 = event.Par3[0]
        if Par3 > 0 and Par3 != 11:
            Energy = event.ADCEnergy[0]
            hist1D.Fill(Energy)

outputfile = ROOT.TFile('SomeOutputFile.root', 'RECREATE')
hist1D.Write()
hist1D.Draw()
canv.SaveAs('SomePDFFile.pdf','pdf')

Profiling the code leads to this:
ncalls tottime percall cumtime percall filename:lineno(function)
1774819 38.554 0.000 38.554 0.000 ROOT.py:263(TTree__iter_)
1 37.310 37.310 77.711 77.711 testfile.py:3()
1 1.435 1.435 1.460 1.460 ROOT.py:509(__finalSetup)

Is there any way to do it faster? 5s to 77s feels somehow wrong (even the TTree__iter takes 38s though it should be executed in C++, right?).

I tried to use “SetBranchAdress” according to wlav from Iteration over a tree in pyroot - performance issue, but it didn’t work (result is a blank histogram, maybe my syntax is wrong?):

Par1= array('d',[0])
Tree.SetBranchAddress('Par1', Par1)

for event in Tree:
    #Par1 = event.Par1
    if Par1 <= 0.1 or Par1 >= 0.2:
        continue

The branch that is looped over looks like this:

*............................................................................* *Br 35 :ADCEnergy : ADCEnergy[2]/F * *Entries : 1774818 : Total Size= 14240332 bytes File Size = 7896807 * *Baskets : 432 : Basket Size= 52224 bytes Compression= 1.80 * *............................................................................*

Thanks a lot for your help!

Danilo · August 9, 2016, 5:24am

Hi,

given the branch features you shared, the array should be of single precision floating point numbers, i.e.

Par1= array('f',[0])

Cheers,
Danilo

Dusj · August 9, 2016, 6:08am

Thank you! How could I miss that…

I’ve applied this for all cuts and also for the filling parameter and the time spent on my .py has gone down to 2 seconds!

ncalls tottime percall cumtime percall filename:lineno(function)
1774819 31.853 0.000 31.853 0.000 ROOT.py:263(TTree__iter_)
1 2.053 2.053 34.859 34.859 testfile.py:3()

I guess the time for calling ROOT.py can’t be reduced, right?
Is the difference of 32s for pyroot to ~5s for the TBrowser based on the python vs CINT codebasis?

Danilo · September 13, 2016, 8:42am

Hi,

as your profile demonstrate, the iteration of the TTree is the piece of code taking the most runtime. The difference is due to the way in which PyROOT finds the object attributes based on branches names so if you want there is a “codebasis difference”. On the other hand CINT here has nothing to do with anything, this is more the code if TTreeDraw, highly optimised, among other things, to deserialise the minimum amount possible of data.

Starting with ROOT 6.08 (which will be released at the end of the month) this performance hit in PyROOT will be greatly mitigated relying on a technique based on Cling’s JITting capabilities.

Cheers,
D

Dusj · September 13, 2016, 10:03am

Thank you very much for the information!

Actually I’m still using v5.34, which is the standard version used at my computation centre.
Root6 (v6.04) is also available since a few months.
I guess I’ll try to change to v6 then with the new update, because this makes more python and less C++ code possible

wlav · September 15, 2016, 4:36pm

Hi,

and I maintain that the best long-term solution for Python performance (and GIL problems) is PyPy. We recently made great progress thanks to GSoC, but if HEP is interested in high-performance Python, more noise needs to be made to the higher-ups to get support for this work.

I first showed that C+±speeds could be attained using PyPy in 2013!

Cheers,
Wim

Danilo · September 15, 2016, 9:07pm

Hi Wim,

thanks for the reminder. Indeed PyPy is a very promising project and many people are looking forward to hear more about it!

Cheers,
Danilo