Problem reading from buffer with numpy

LeWhoo · July 24, 2012, 2:42pm

In my TTree I have branch mag[6][172]/D

It works fine with the array module:

ua = array('d', t.mag)
len(ua)

returns 1032. However, reading in this way is very slow, so I decided to give numpy a try.

ua2 = np.frombuffer(t.mag, dtype='d')
len(ua2)

gives 129 elements. Those elements are the same as in the tree, but why only 129? Same problem exists when I try to use numpy ndarray.

LeWhoo · July 24, 2012, 3:59pm

It seems that numpy does not know how to handle this buffer - it treats it’s number of elements as number of bytes…

Anyway, I found that converting buffer to an array with the array module is very slow (a few times slower then reading the buffer from the TTree) and I am afraid numpy won’t be much faster. The question is: what is the most efficient way to convert buffer to a python array?

wlav · July 24, 2012, 9:21pm

Hi,

didn’t have time to try this out, but the numpy version should be faster, as it shouldn’t copy the buffer (like the array does).

The number 129 comes in from 6*176/sizeof(double) = 129. The problem is that internally, buffers are character-based (i.e. byte-sized) and PyROOT fudges this by changing the stride on access. There have been improvements in the buffer interface for numpy, but that was for python3, with some compatibility in p2.6 and later, and I’ve never taken the time to flesh support for this out in detail in PyROOT. The array OTOH does a normal iteration over the buffer and does not use the C-level buffer interface.

Note that, when it comes to speed, I’ve written off PyROOT completely in favor of CppyyROOT, although I still have to do builtin arrays from TTrees for it (data structures and objects work).

Anyway, again haven’t tried, but adding a “count = len(t.mag)” argument to np.frombuffer() should work.

Cheers,
Wim

LeWhoo · July 25, 2012, 9:05am

Unfortunately “count=len(t.mag)” does not work. It gives: ValueError, buffer is smaller than requested size… So I am still stuck with a very slow buffer conversion

LeWhoo · July 25, 2012, 11:44am

An itermediate solution is to use fromiter(), which is ~30% faster than array(). However, still “GetEntry()” only loop on my TTree takes ~5 s, while with filling an array with fromiter it takes ~25 s, which is a very significant time difference.

wlav · July 25, 2012, 4:17pm

Hi,

I’ll find some time to look into this in more detail, but what if you first do:t.mag.SetSize(len(t.mag)*8)
Or even easier, set t.mag.SetSize(sys.maxint)? After that, you can no longer iterate over t.mag (not until you reset the size anyway) as it won’t terminate properly, but the frombuffer() with properly given count may work.

Cheers,
Wim

LeWhoo · July 25, 2012, 8:53pm

Works It is about 30% faster then fromiter. However, it is still very slow compared to simply GetEntry(). I guess it may be not simple to improve it, but it seems like it could be a bottle neck for reading from TTree with PyROOT.

wlav · July 25, 2012, 9:03pm

Hi,

at issue is that any access into an array from Python involves a lot more work than what is done in C++, where an array access is virtually free (memory needs to be loaded in the CPU, but that’s about it). Anything in CPython requires at least the getting of the buffer, the unpacking of the index, and the packing of the value.

For comparison, simple TTree benchmarks currently run at a range of 1.6 - 2.3x of compiled, optimized C++ in CppyyROOT as it requires no such unwrapping/wrapping. I expect to be able to reach 1x with Cling, as that should allow me to get rid of the stub overhead. CPython will never be able to touch that.

Cheers,
Wim

LeWhoo · July 26, 2012, 10:08am

I understand. So, I’ll wait for the new version with Cling

LeWhoo · July 26, 2012, 10:51am

Ups, sorry, I made a mistake when checking. It seems that t.mag.SetSize(len(t.mag)*8) does not work - frobuffer still returns 129 elements, and with count=1032 claims that buffer is too small…

wlav · July 26, 2012, 4:41pm

No, pypy-cint is available today, just not fully compatible with PyROOT yet (and therefore not announced):[code]% source /afs/.cern.ch/sw/lcg/external/pypy/x86_64-slc5/setup.sh
% pypy-cint
Python 2.7.2 (a10072d752f3+, Jul 22 2012, 23:38:16)
[PyPy 1.9.1-dev0 with GCC 4.3.5] on linux2
Type “help”, “copyright”, “credits” or “license” for more information.

import CppyyROOT as ROOT

etc …[/code]

Note also that this is the .cern.ch afs disk, not the cern.ch one, and is therefore a tad slow. TTree usage works for objects. I’ll get around to numpy or array arrays soon enough.

Another option would be to create the numpy array first, then use SetBranchAddress(). I’ll give it a try.

Cheers,
Wim

wlav · July 26, 2012, 5:05pm

Hi,

so the following works for me for CPython and numpy. This write_tree.py:[code]from ROOT import TFile, TTree
import numpy

N = 5000
D1, D2 = 6, 172

f = TFile(“test.root”, “RECREATE”)
t = TTree(“test”, “test tree”)

a = numpy.zeros((D1, D2), dtype=‘d’)
t.Branch(“mag”, a, “mag[%d][%d]/D” % (D1, D2))

for event in range(N):
for i in range(D1):
for j in range(D2):
a[i][j] = N + i*D1 + j
t.Fill()

f.Write()
f.Close()[/code]
and this read_tree.py:[code]from ROOT import TFile
import numpy

D1, D2 = 6, 172

f = TFile(“test.root”)
t = f.test

a = numpy.zeros((D1, D2), dtype=‘d’)
t.SetBranchAddress(“mag”, a)

N = t.GetEntriesFast()
assert N == 5000
for event in range(N):
t.GetEntry()
for i in range(D1):
for j in range(D2):
assert a[i][j] == N + i*D1 + j

f.Close()[/code]
It does not work for pypy-cint, as somewhat expected. I think that the numpypy array does not implement the raw buffer interface at the interpreter level. The use of a normal python array from module array does work, but that’s not nice for a 2-dim array.

EDIT: and if I do implement the code with array on pypy-cint, the writing/reading is fully I/O bound, as opposed to almost CPU-bound on CPython.

Cheers,
Wim

wlav · July 26, 2012, 9:58pm

Hi,

so theoretically SetSize() works for me (i.e. in that it works with frombuffer(). However, numpy.frombuffer() causes some memory overwrite. I’m not sure where, but according to valgrind, numpy free()s the original buffer (maybe its relocating it?), which of course does not work (ROOT will still write in the old array address).

Cheers,
Wim

LeWhoo · July 27, 2012, 10:23am

Well, anyway, setting the branch address to the numpy array works and is nicely manydimensional. Thanks!