Decreasing the runtime of pairwise track combinations in pyroot


I am trying to optimise the speed to a PyRoot script that combines the invariant masses of muon tracks in a TTree and then combines the resulting invariant mass with pairwise combinations of oppositely charged pion tracks. I have written a function that performs the task at hand, however, it is crucial for me to try to decrease the runtime.

Here is my current attempt:

from ROOT import *
from math import *
import heapq
import numpy as np
import itertools
from array import array

def MM(tree, hist)::
    for entry in tree:
        #access the momenta and other vars
        buf_run = np.array(
        buf_evt = np.array(entry.evt)
        buf_bunchID = np.array(entry.bunchid)
        bufpx = np.array(entry.px) 
        bufpy = np.array(
        bufpz = np.array(entry.pz)
        bufpt = np.array(
        bufPIDp = np.array(entry.pidp)
        bufPIDk = np.array(entry.pidk)
        bufPIDpi = np.array(entry.pidpi)
        isMuon = np.array(entry.muon)
        longTrack = np.array(entry.nlong)
        Charge = np.array(entry.charge)
        if sum(isMuon)>=1 and sum(Charge[np.where(isMuon==1)])==0: #at least one muon track and sum of charges of muon tracks = 0
            #create an empty 4-vector to track muon mass
            nul_mu = TLorentzVector()
            #for every track in the event 
            for i in range(len(bufpx)):
                #add muon 4-vecs
                if isMuon[i]==1:
                  mubuf_particle = TLorentzVector()
                  mubuf_particle.SetXYZM(bufpx[i], bufpy[i], bufpz[i], m_mu)
            #reconstruct mass around the J-Psi nominal mass value and at least 2 other non-muon (pion) tracks
            if nul_mu.M() > (m_JPsi - 65.)  and nul_mu.M() < (m_JPsi + 65.) and len(np.where(isMuon!=1)[0])>=2: 
                pi_idxs=np.where(isMuon!=1) #indices of pio tracks
                temp=np.array(list(itertools.combinations(pi_idxs[0], 2))) #combinatorics of indices of hadron tracks, no repetitions, pairwise combinations
                neutral_pairs_idxs = temp[np.where(np.sum(Charge[temp], axis=1)==0)] 
                for pair in neutral_pairs_idxs:
                    idx1 = pair[0]
                    idx2 = pair[1]

                    buf_piplus = TLorentzVector()
                    buf_piminus = TLorentzVector()

                    buf_piplus.SetXYZM( bufpx[idx1], bufpy[idx1], bufpz[idx1] , m_pi ) 
                    buf_piminus.SetXYZM( bufpx[idx2], bufpy[idx2], bufpz[idx2] , m_pi ) 
                    #Fill with JPsi and 2 hadron (opposite charge) tracks
                    #hist.Fill( (nul_mu+buf_piplus+buf_piminus).M())
                    #Fill mu+mu+pi+pi_MM - mu+mu_MM + nominal mass value of J/Psi (=3097), should improve S/B separation
                    #mumu_array[0]=(nul_mu+buf_piplus+buf_piminus).M() - nul_mu.M() + 3097
                    hist.Fill( (nul_mu+buf_piplus+buf_piminus).M() - nul_mu.M() + 3097)

In my attempt to speed to the code I use only the branches I am interested in

#worry only about branches of interest, in the hope to speed the code
c.SetBranchStatus("*", 0)

Can anyone suggest a way to speed up the code either in reading the entries or perform the pairwise combinations?

I have been thinking of writing a Root script to perform this task, and I do not seem to manage to perform the pairwise combinations successfully. Can anyone suggest a way to do this?



Before trying to reduce the run time you need to understand where the time is spent. Sounds obvious once you think about it but it’s really crucial :slight_smile:

There are good tools out there, e.g. valgrind’s profilers callgrind and cachegrind; MacOS has profiling as part of Xcode (Instruments).

You can also get a first estimate from the TTreePerfStats which shows CPU and real time.

If you have any of these profiles please let us know and we should be able to help you!

Cheers, Axel.

Hi Axel,

Thank you for your feedback - I will have a look and get back to you. In the meantime, may I ask if the overall approach of the script i.e. looping over the TTree entry by entry is sensible, or whether there is a better/faster way overall strategy?

I could not think of a way to redesign the code to perform the task other than considering every entry in the TTree. However, I am a beginner and I would appreciate any help on the matter.



Edit: I am running over half of the EW 16 Stream. That is

[in] print 'Total number of events  = %i'%c.GetEntries()
[out] Total number of events  = 1148855315

In principle that’s fine - there are lots of things you could improve (e.g. late reading form the tree, only if you actually need if; skipping to the next tree entry early before reading all the other branches if you have less than 2 muons etc - btw, how do you get one muon with total muon charge of 0? shouldn’t that be sum(isMuon)>=2) but there’s no point if we don’t know where the time is spent.


Hi Axel,

Regarding the run time profiles, I have tried the TTreePerfStats method, as follows, in PyRoot:

from ROOT import *

##TChain for many root files
fc = TFileCollection("fc", "fc", "url_list")
c = TChain("Tree")

nentries = c.GetEntries()

ps = TTreePerfStats("ioperf",c)

for i in range(nentries):


The script seems to go through a broken pipe-like issues, as I receive a Killed output.
I have tried c.SetCacheSize(-1) with the same outcome. Could you please advise on how to proceed?

Thank you


Maybe @mato or @pcanal can have a look?