Slowness with std::vector in PyRoot

Hi,

I need write std.vector variables into a new TTree. But it seems that in PyRoot calling std.vector methods in such as clear and push_back are very slow, in contract with the speed in C++. Any way to speed up calling std.vector methods or alternative fast way to do the same job?

–Shuwei

Shuwei,

yes, in C++ the push_back() method is, modulo the memory allocation, basically fully inlined into a straight memory copy of the value into the proper location in the vector. In Python, there’s the unboxing of variables, and then all the calls through (and setup of) the stub functions of the dictionary.

Only thing I can offer today, is PyPy/cppyy with the CINT backend. It’ll still use stub functions (won’t disappear until Cling; we can do w/o stubs with Reflex, but Reflex does not work with ROOT I/O). That’s significantly faster, but cppyy isn’t feature complete yet, so you can’t use it for graphics for example (will crash at some point during the session with high likelihood due to thread conflicts). Still, it should allow you to run the same tree filling code that’ll work under CPython.

Have a look under /afs/.cern.ch/sw/lcg/external/pypy :

[code] cd /afs/.cern.ch/sw/lcg/external/pypy/x86_64-slc5 source ./setup.sh
$ pypy-cint

import CppyyROOT as ROOT

etc. …[/code]

Cheers,
Wim

Hi Wim,

Many thanks. I will have a try.

Does PyPy/cppyy also help std.vector.clear? If not, why is std.vector.clear, just deleting objects held in vector, so slow in PyRoot?

–Shuwei

Shuwei,

what kind of objects are in the vector? If they’re TObject deriveds, then their deletion will go through the memory regulator. Otherwise, I wouldn’t expect any difference.

Cheers,
Wim

Hi Wim,

They are vectors of int and float. I have a simplified example as follows:

import sys,os

import ROOT
from ROOT import (TPySelector,gSystem,gROOT,TStopwatch)

class ttbarClear2( TPySelector ):
#     ===========
# main analysis class using TSelector to loop over TTrees/Events

   #---------------------------
   def __init__(self,tree=None):
       self.Debug = False
       self.timer  = None
       self.InitTreeAndHist()

   def InitTreeAndHist(self):
       std = ROOT.std

       # W->jj
       self.m_Wjj_mass = std.vector(float)()
       self.m_Wjj_pt   = std.vector(float)()
       self.m_Wjj_eta  = std.vector(float)()
       self.m_Wjj_delR = std.vector(float)()

       # jet-1
       self.m_Wjj_j1_i = std.vector(int)()
       self.m_Wjj_j1_pt    = std.vector(float)()
       self.m_Wjj_j1_eta = std.vector(float)()
       self.m_Wjj_j1_wt  = std.vector(float)()

       # jet-2
       self.m_Wjj_j2_i = std.vector(int)()
       self.m_Wjj_j2_pt    = std.vector(float)()
       self.m_Wjj_j2_eta = std.vector(float)()
       self.m_Wjj_j2_wt  = std.vector(float)()

       # W->lv
       self.m_Wlv_mode   = std.vector(int)()
       self.m_Wlv_mass   = std.vector(float)()
       self.m_Wlv_pt     = std.vector(float)()
       self.m_Wlv_eta    = std.vector(float)()
       self.m_Wlv_delR   = std.vector(float)()
       self.m_Wlv_pt_nu  = std.vector(float)()
       self.m_Wlv_eta_nu = std.vector(float)()
       self.m_Wlv_lep_i  = std.vector(int)()

       # top
       self.m_top_mode = std.vector(int)()
       self.m_top_W_i = std.vector(int)()
       self.m_top_b_i = std.vector(int)()
       self.m_top_mass = std.vector(float)()
       self.m_top_pt = std.vector(float)()
       self.m_top_eta = std.vector(float)()
       self.m_top_phi = std.vector(float)()
       self.m_top_delR = std.vector(float)()
       self.m_top_b_pt = std.vector(float)()
       self.m_top_b_eta = std.vector(float)()
       self.m_top_b_WT = std.vector(float)()

       # ttbar
       self.m_ttbar_mass  = std.vector(float)()
       self.m_ttbar_pt    = std.vector(float)()
       self.m_ttbar_eta   = std.vector(float)()
       self.m_ttbar_delR  = std.vector(float)()
       self.m_ttbar_lvb_i = std.vector(int)()
       self.m_ttbar_jjb_i = std.vector(int)()

   #--------------------
   def InitEvent( self ):

       self.m_Wjj_mass.clear()
       self.m_Wjj_pt.clear()
       self.m_Wjj_eta.clear()
       self.m_Wjj_delR.clear()
       self.m_Wjj_j1_i.clear()
       self.m_Wjj_j2_i.clear()
       self.m_Wjj_j1_pt.clear()
       self.m_Wjj_j1_eta.clear()
       self.m_Wjj_j1_wt.clear()
       self.m_Wjj_j2_pt.clear()
       self.m_Wjj_j2_eta.clear()
       self.m_Wjj_j2_wt.clear()

       self.m_Wlv_lep_i.clear()
       self.m_Wlv_mode.clear()
       self.m_Wlv_mass.clear()
       self.m_Wlv_pt.clear()
       self.m_Wlv_eta.clear()
       self.m_Wlv_delR.clear()
       self.m_Wlv_pt_nu.clear()
       self.m_Wlv_eta_nu.clear()

       self.m_top_mode.clear()
       self.m_top_W_i.clear()
       self.m_top_b_i.clear()
       self.m_top_mass.clear()
       self.m_top_pt.clear()
       self.m_top_eta.clear()
       self.m_top_phi.clear()
       self.m_top_delR.clear()
       self.m_top_b_pt.clear()
       self.m_top_b_eta.clear()
       self.m_top_b_WT.clear()

       self.m_ttbar_mass.clear()
       self.m_ttbar_pt.clear()
       self.m_ttbar_eta.clear()
       self.m_ttbar_delR.clear()
       self.m_ttbar_lvb_i.clear()
       self.m_ttbar_jjb_i.clear()

   #----------------
   def Begin( self ):
       print 'py: beginning'

   #---------------------------
   def SlaveBegin( self, tree ):
       print 'py: slave beginning'
       if self.timer == None:
          self.timer = TStopwatch()
          self.timer.Start()

   #---------------------
   def Init( self, tree ):
       print 'py: Init'
       self.TREE = tree
       print "Current file=",tree.GetCurrentFile().GetName()

   #-------------------------
   def Process( self, entry ):
       if entry%1000 == 0:
          print "entry=",entry
       # self.InitEvent()
       return 1

   #-------------------------
   def SlaveTerminate( self ):
       print 'py: slave terminating'

   #--------------------
   def Terminate( self ):
       print 'py: terminating'

To loop a TChain with 1965152 events, the CPU time spent on “self.InitEvent()” in PyRoot, clearing 37 std vectors for each event, is about 90s. If I did the same clearing in c++ frame, it would take about 1s. It is a dramatic difference.

–Shuwei

Shuwei,

probably nothing to do with clear() itself, but rather that clear() is completely inline in C++, whereas for Python, you incur the overhead of the stub function (on top of the function itself being out-of-line, which probably doesn’t matter all that much). In particular, clear() for ints and longs does basically nothing: there are no destructors to call, so it’s basically no more than a pointer reset. From gcc’s STL stl_vector.h: void _M_erase_at_end(pointer __pos) { std::_Destroy(__pos, this->_M_impl._M_finish, _M_get_Tp_allocator()); this->_M_impl._M_finish = __pos; }
with _Destroy() being a no-op for ints (it’s a loop over the elements, calling the dtor on each).

It’s possible to recode specific instances of std::vector to not go through the stubs (i.e. basically build in and hardwire the methods: clear() does not take any arguments after all), but that’s probably not worth it, unless done through a code generator (i.e. in the PyCling world).

Cheers,
Wim

Wim,

I tried pypy-cint, but my job crashed at TPySelector::SetupPySelf:

And my scripts are very simple.

The script run-ttbarClear.py reads:

from CppyyROOT import (TChain,gSystem,gROOT,TStopwatch)

files = file("input-p822.txt").read().split()
chain = TChain("physics")
chainAdd = chain.Add

for i in files:
  chainAdd(i)

timer = TStopwatch()
timer.Start()
chain.Process('TPySelector',"ttbarClear")
timer.Stop(); timer.Print("m")

The script ttbarClear.py reads:

import CppyyROOT as ROOT
from CppyyROOT import (TPySelector,gSystem,gROOT,TStopwatch)

class ttbarClear( TPySelector ):
#     ==========
# main analysis class using TSelector to loop over TTrees/Events

   #---------------------------
   def __init__(self,tree=None):
       self.Debug = False
       self.timer  = None


   #----------------
   def Begin( self ):
       print 'py: beginning'


   #---------------------------
   def SlaveBegin( self, tree ):
       print 'py: slave beginning'
       if self.timer == None:
          self.timer = TStopwatch()
          self.timer.Start()


   #---------------------
   def Init( self, tree ):
       print 'py: Init'
       self.TREE = tree
       print "Current file=",tree.GetCurrentFile().GetName()


   #-------------------------
   def Process( self, entry ):
       if entry%1000 == 0:
          print "entry=",entry
       return 1


   #-------------------------
   def SlaveTerminate( self ):
       print 'py: slave terminating'


   #--------------------
   def Terminate( self ):
       print 'py: terminating'

Any idea?

–Shuwei

Shuwei,

yes, TPySelector is not supported yet (for the same reason was why graphics are still trouble). When I first talked about CppyyROOT, I thought you were running directly on TTrees.

Soon …

Cheers,
Wim

Hi Wim,

Then I tried std.vector.clear without involving TPySelector, but it still crashed:

My scrip “run-vectorClear2.py” is:

from CppyyROOT import (TChain,gSystem,gROOT,TStopwatch)
# from ROOT import (TChain,gSystem,gROOT,TStopwatch)

entries = 2001

from vectorClear2 import vectorClear2
myVecClear = vectorClear2()

for ievt in xrange(entries):
   myVecClear.Process(ievt)

And my script “vectorClear2.py” reads:

import CppyyROOT as ROOT
# import ROOT

class vectorClear2:
#     ============

   #---------------------------
   def __init__(self):
       self.Debug = False
       self.timer  = None
       self.SetupVectors()

   #------------------------
   def SetupVectors(self):
       std = ROOT.std

       # W->jj
       self.m_Wjj_mass = std.vector(float)()
       self.m_Wjj_pt   = std.vector(float)()
       self.m_Wjj_eta  = std.vector(float)()
       self.m_Wjj_delR = std.vector(float)()
       [...]

   #--------------------
   def InitEvent( self ):

       self.m_Wjj_mass.clear()
       self.m_Wjj_pt.clear()
       self.m_Wjj_eta.clear()
       self.m_Wjj_delR.clear()
       [...]

   #-------------------------
   def Process( self, entry ):
       if entry%100 == 0:
          print "entry=",entry
       self.InitEvent()

Did I do anything wrong?

–Shuwei

Shuwei,

doesn’t crash for me. Might be simply that the GUI thread is on: I’m running at CERN from LBL, batch only, when I say I’ve no such crash.

I’ve now switched the GUI thread off by default in CppyyROOT.py on afs. Could you try again?

Thanks,
Wim