Extract entries with vector type branches

It’s been a week that I started to use PyROOT . I have a root file which contains of 1000 entry and 97 branches and each branch is a vector .

Jet = 8 Jet_E = (vector<float>*)0x430b780 Jet_Px = (vector<float>*)0x432d380 Jet_Py = (vector<float>*)0x4336230 Jet_Pz = (vector<float>*)0x433f0e0 Jet_Eta = (vector<float>*)0x4347f90 Jet_Phi = (vector<float>*)0x4350e40 Jet_PT = (vector<float>*)0x4359cf0 Jet_Btag = (vector<int>*)0x4362ba0 Jet_NTracks = (vector<int>*)0x436ba50 Jet_EHoverEE = (vector<float>*)0x4374900 Jet_size = 8 ... Electron = 0 Electron_E = (vector<float>*)0x438d530 Electron_Px = (vector<float>*)0x4396560 Electron_Py = (vector<float>*)0x439f590 Electron_Pz = (vector<float>*)0x43a85c0 .....
I need to pass the entry number as a parameter to my function and it returns me the numpy array containing all branches. I managed to do that by first call tree.GetEntry(i) and then irritate over each branch :

gp1 = np.append(gp1,np.array(t.TauJet_Px).reshape(k,1),axis=1) gp1 = np.append(gp1,np.array(t.TauJet_Py).reshape(k,1),axis=1) gp1 = np.append(gp1,np.array(t.TauJet_Pz).reshape(k,1),axis=1) gp1 = np.append(gp1,np.array(t.TauJet_Eta).reshape(k,1),axis=1) ... p2 = np.append(gp2,np.array(t.Jet_Px).reshape(k,1),axis=1) gp2 = np.append(gp2,np.array(t.Jet_Py).reshape(k,1),axis=1) gp2 = np.append(gp2,np.array(t.Jet_Pz).reshape(k,1),axis=1) gp2 = np.append(gp2,np.array(t.Jet_Eta).reshape(k,1),axis=1) ..
and at the end :

record_list.append([gp1,gp2,gp3,gp4,gp5,gp6,gp7,gp8])
but it’s kinf of slow and ugly.

I also try stuffs that discussed in this page ( Locate like command ) but none of them helped much.
Also using :

br=tree.GetBranch('X') nb=br.GetEntry(i)

I dont get any vector ! just single value!
Can you give me some hint ?
Thanks

Hi,

not sure how you expect to get a vector with the last snippet of code, but in answer of where to look for slowness in the first code, there are two places. One, is the creation of vectors if the branches are accessed as “data members” of the tree (an object is created on each “data member” lookup) and creating a single std::vector that is handed to SetBranchAddress() is the solution there. The second issue is that the copying to the numpy array if done through the python interface, has a lot of overhead for each entry call: the iteration (in order to be generic) goes through the begin()/end() iteration thus has calls for each lookup, rather than simply accessing the memory directly as can be done for a vector when the size of its elements is known.

I don’t have further ideas for improvement (other than using a C++ helper and the numpy C-API to pass an address directly, instead of individual elements) within the context of CPython.

With PyPy/cppyy, and in particularly Cling, it’s different matter: with a real compiler in the back, a helper can be generated for the std::vector at hand.

Cheers,
Wim

Thanks for you reply

by just using result=tree.mybranch I get the data from my branch and np.array(result) I get the numpy array. other stuff are just changing the format of the array.

I tried to used SetBranchAddress as it’s used in this thread ,second post (Locate like command) but it doesnt return the vector , just a single value !

The second issue is that the copying to the numpy array if done through the python interface, has a lot of overhead for each entry call: the iteration (in order to be generic) goes through the begin()/end() iteration thus has calls for each lookup, rather than simply accessing the memory directly as can be done for a vector when the size of its elements is known.

Can you provide me with an example to see how should I do that?

With PyPy/cppyy, and in particularly Cling, it’s different matter: with a real compiler in the back, a helper can be generated for the std::vector at hand.

Thanks

Hi,

attached are an example .C macro containing functions that do vector -> numpy conversion in C++ (done for int, float, and double; similarly add any other type as need be), and a python script that makes use of them.

Was a little bit more convoluted than I expected. :confused: But the use of the conversion should be simple, even if it’s writing isn’t so straightforward. Of course, you can also put the code in a separate extension module if you prefer (in that case, look at TPython.h to extract a void* pointer from a bound PyROOT object and cast it to the expected C++ instance).

HTH,
Wim
nump.C (1.18 KB)
nump.py (353 Bytes)

It keeps giving me the following error:

In [27]: ROOT.gSystem.AddIncludePath( ‘-I/usr/include/python2.7’ )
In [28]: ROOT.gSystem.AddIncludePath( ’ -I/usr/local/lib/python2.7/dist-packages/numpy/core/include’ )

In [29]: ROOT.gROOT.LoadMacro( ‘nump.C+’ )

Info in TUnixSystem::ACLiC: creating shared library /home/python-workespace/p1/src/nump_C.so
/home/python-workespace/p1/src/nump_C_ACLiC_dict.o: In function global constructors keyed to nump_C_ACLiC_dict.cxx': nump_C_ACLiC_dict.cxx:(.text+0x29f): undefined reference toPyImport_ImportModule’
nump_C_ACLiC_dict.cxx:(.text+0x2ba): undefined reference to PyObject_GetAttrString' nump_C_ACLiC_dict.cxx:(.text+0x2e2): undefined reference toPyCObject_Type’
nump_C_ACLiC_dict.cxx:(.text+0x2f4): undefined reference to PyCObject_AsVoidPtr' nump_C_ACLiC_dict.cxx:(.text+0x3d3): undefined reference toPyExc_RuntimeError’
nump_C_ACLiC_dict.cxx:(.text+0x3e2): undefined reference to PyErr_SetString' nump_C_ACLiC_dict.cxx:(.text+0x41e): undefined reference toPyExc_RuntimeError’
nump_C_ACLiC_dict.cxx:(.text+0x434): undefined reference to PyErr_Format' nump_C_ACLiC_dict.cxx:(.text+0x452): undefined reference toPyExc_RuntimeError’
nump_C_ACLiC_dict.cxx:(.text+0x468): undefined reference to PyErr_Format' nump_C_ACLiC_dict.cxx:(.text+0x47b): undefined reference toPyExc_RuntimeError’
nump_C_ACLiC_dict.cxx:(.text+0x48c): undefined reference to PyErr_Format' nump_C_ACLiC_dict.cxx:(.text+0x49b): undefined reference toPyExc_RuntimeError’
nump_C_ACLiC_dict.cxx:(.text+0x4ac): undefined reference to PyErr_Format' nump_C_ACLiC_dict.cxx:(.text+0x4b8): undefined reference toPyExc_ImportError’
nump_C_ACLiC_dict.cxx:(.text+0x4c7): undefined reference to PyErr_SetString' nump_C_ACLiC_dict.cxx:(.text+0x4d3): undefined reference toPyExc_RuntimeError’
nump_C_ACLiC_dict.cxx:(.text+0x4e2): undefined reference to PyErr_SetString' nump_C_ACLiC_dict.cxx:(.text+0x4ee): undefined reference toPyExc_AttributeError’
nump_C_ACLiC_dict.cxx:(.text+0x4fd): undefined reference to `PyErr_SetString’
collect2: ld returned 1 exit status
Error in : Compilation failed!

why is that?

Hi,

those should all be symbols that are defined in the python interpreter, i.e. available. What system are you running on? Otherwise adding libpythonx.y.so through gSystem.AddLinkedLibs() would be the ticket.

Cheers,
Wim

I am running linux.I guess pythonx.y is for windows right?

Hi,

no, libpythonx.y.so is on Linux. So yes, best I can think of is gSystem.AddLinkedLibs(), but for the life of me I do not understand why that would be necessary (on Mac, with python under /usr instead of as a framework install, I could have imagined a double linker namespace, but not on Linux).

Cheers,
Wim