# Decreasing the runtime of pairwise track combinations in pyroot

Hello,

I am trying to optimise the speed to a PyRoot script that combines the invariant masses of muon tracks in a TTree and then combines the resulting invariant mass with pairwise combinations of oppositely charged pion tracks. I have written a function that performs the task at hand, however, it is crucial for me to try to decrease the runtime.

Here is my current attempt:

``````
from ROOT import *
from math import *
import heapq
import numpy as np
import itertools
from array import array

def MM(tree, hist)::
for entry in tree:
#access the momenta and other vars
buf_run = np.array(entry.run)
buf_evt = np.array(entry.evt)
buf_bunchID = np.array(entry.bunchid)
bufpx = np.array(entry.px)
bufpy = np.array(entry.py)
bufpz = np.array(entry.pz)
bufpt = np.array(entry.pt)
bufPIDp = np.array(entry.pidp)
bufPIDk = np.array(entry.pidk)
bufPIDpi = np.array(entry.pidpi)
isMuon = np.array(entry.muon)
longTrack = np.array(entry.nlong)
Charge = np.array(entry.charge)

if sum(isMuon)>=1 and sum(Charge[np.where(isMuon==1)])==0: #at least one muon track and sum of charges of muon tracks = 0

#create an empty 4-vector to track muon mass
nul_mu = TLorentzVector()
nul_mu.SetXYZM(0.,0.,0.,0.)

#for every track in the event
for i in range(len(bufpx)):

if isMuon[i]==1:
mubuf_particle = TLorentzVector()
mubuf_particle.SetXYZM(bufpx[i], bufpy[i], bufpz[i], m_mu)
nul_mu+=mubuf_particle

#reconstruct mass around the J-Psi nominal mass value and at least 2 other non-muon (pion) tracks
if nul_mu.M() > (m_JPsi - 65.)  and nul_mu.M() < (m_JPsi + 65.) and len(np.where(isMuon!=1)[0])>=2:

pi_idxs=np.where(isMuon!=1) #indices of pio tracks
temp=np.array(list(itertools.combinations(pi_idxs[0], 2))) #combinatorics of indices of hadron tracks, no repetitions, pairwise combinations
neutral_pairs_idxs = temp[np.where(np.sum(Charge[temp], axis=1)==0)]

for pair in neutral_pairs_idxs:
idx1 = pair[0]
idx2 = pair[1]

buf_piplus = TLorentzVector()
buf_piminus = TLorentzVector()

buf_piplus.SetXYZM( bufpx[idx1], bufpy[idx1], bufpz[idx1] , m_pi )
buf_piminus.SetXYZM( bufpx[idx2], bufpy[idx2], bufpz[idx2] , m_pi )

#Fill with JPsi and 2 hadron (opposite charge) tracks
#hist.Fill( (nul_mu+buf_piplus+buf_piminus).M())

#Fill mu+mu+pi+pi_MM - mu+mu_MM + nominal mass value of J/Psi (=3097), should improve S/B separation
#mumu_array[0]=(nul_mu+buf_piplus+buf_piminus).M() - nul_mu.M() + 3097
#tdata.Fill()
hist.Fill( (nul_mu+buf_piplus+buf_piminus).M() - nul_mu.M() + 3097)

``````

In my attempt to speed to the code I use only the branches I am interested in

``````#worry only about branches of interest, in the hope to speed the code
c.SetBranchStatus("*", 0)
c.SetBranchStatus("px",1)
c.SetBranchStatus("py",1)
c.SetBranchStatus("pz",1)
c.SetBranchStatus("muon",1)
c.SetBranchStatus("charge",1)
``````

Can anyone suggest a way to speed up the code either in reading the entries or perform the pairwise combinations?

I have been thinking of writing a Root script to perform this task, and I do not seem to manage to perform the pairwise combinations successfully. Can anyone suggest a way to do this?

Thanks,

bldelane

Before trying to reduce the run time you need to understand where the time is spent. Sounds obvious once you think about it but it’s really crucial

There are good tools out there, e.g. valgrind’s profilers callgrind and cachegrind; MacOS has profiling as part of Xcode (Instruments).

You can also get a first estimate from the TTreePerfStats https://root.cern.ch/doc/master/classTTreePerfStats.html which shows CPU and real time.

If you have any of these profiles please let us know and we should be able to help you!

Cheers, Axel.

Hi Axel,

Thank you for your feedback - I will have a look and get back to you. In the meantime, may I ask if the overall approach of the script i.e. looping over the TTree entry by entry is sensible, or whether there is a better/faster way overall strategy?

I could not think of a way to redesign the code to perform the task other than considering every entry in the TTree. However, I am a beginner and I would appreciate any help on the matter.

Thanks,

bldelane

Edit: I am running over half of the EW 16 Stream. That is

``````[in] print 'Total number of events  = %i'%c.GetEntries()
[out] Total number of events  = 1148855315
``````

In principle that’s fine - there are lots of things you could improve (e.g. late reading form the tree, only if you actually need if; skipping to the next tree entry early before reading all the other branches if you have less than 2 muons etc - btw, how do you get one muon with total muon charge of 0? shouldn’t that be `sum(isMuon)>=2`) but there’s no point if we don’t know where the time is spent.

Axel.

Hi Axel,

Regarding the run time profiles, I have tried the TTreePerfStats method, as follows, in PyRoot:

``````from ROOT import *

##TChain for many root files
fc = TFileCollection("fc", "fc", "url_list")
c = TChain("Tree")

nentries = c.GetEntries()
c.SetCacheSize(10000000)
c.SetCacheEntryRange(0,nentries)

ps = TTreePerfStats("ioperf",c)

for i in range(nentries):
c.GetEntry(i)

ps.SaveAs("cmsperf.root")
``````

The script seems to go through a broken pipe-like issues, as I receive a `Killed` output.
I have tried `c.SetCacheSize(-1)` with the same outcome. Could you please advise on how to proceed?

Thank you

Hi,

Maybe @mato or @pcanal can have a look?

Axel.