Using multiprocessing to access tree elements and do stuff!

Hi,

I have a very large tree (>~1M) which I need to access some of its elements at random and retrieve the data using some criteria. The problem is when I want to parallelize this task among my computer cores. After some debugging I realize the problem is passing the file object(f=ROOT.TFile(“MergedTrees.root”)) to function in every process. This is my code which should run in every cores:

def worker(tree,f,func,table,features,indexes):
    print "starting:" , mp.current_process().name
    time.sleep(2)
    
    print tree.GetEntry(100) # returns -1

    x=[]
    y=[]
    appendx=x.append
    appendy=y.append
    for ind in indexes[0]:
       # print type(ind)
       # print tree.GetEntry(100)
        tree.GetEntry(ind) #access event number "ind"
        appendx(func(tree,features)) #get data samples
        appendy(table[ind]) #store lable of current event

    return x,y

do you have any suggestion for solving this problem? One easy was would be to have several copy of my root file and each process open specific file.But I don’t like this solution and seems very silly.

for example if i try to print f in my function, like

print f
I am getting this error:


 *** Break *** segmentation violation



===========================================================
There was a crash.
This is the entire stack trace of all threads:
===========================================================
#0  0x00007f25bce4912d in waitpid () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f25bcddbe8e in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007f25bcddc2a0 in system () from /lib/x86_64-linux-gnu/libc.so.6
#3  0x00007f25b008298e in TUnixSystem::StackTrace() () from /home/jadidi/Downloads/root/lib/libCore.so
#4  0x00007f25b0082223 in TUnixSystem::DispatchSignals(ESignals) () from /home/jadidi/Downloads/root/lib/libCore.so
#5  <signal handler called>
#6  0x00007f25b2b6d9e5 in TDirectoryFile::Get(char const*) () from /home/jadidi/Downloads/root/lib/libRIO.so
#7  0x00007f25b00cc485 in G__G__Base1_8_0_33(G__value*, char const*, G__param*, int) () from /home/jadidi/Downloads/root/lib/libCore.so
#8  0x00007f25af52112f in Cint::G__CallFunc::Execute(void*) () from /home/jadidi/Downloads/root/lib/libCint.so
#9  0x00007f25b2f446cb in PyROOT::TRootObjectExecutor::Execute(Cint::G__CallFunc*, void*) () from /home/jadidi/Downloads/root/lib/libPyROOT.so
#10 0x00007f25b2f4b08e in PyROOT::TMethodHolder<PyROOT::TScopeAdapter, PyROOT::TMemberAdapter>::CallSafe(void*) () from /home/jadidi/Downloads/root/lib/libPyROOT.so
#11 0x00007f25b2f4c1eb in PyROOT::TMethodHolder<PyROOT::TScopeAdapter, PyROOT::TMemberAdapter>::Execute(void*) () from /home/jadidi/Downloads/root/lib/libPyROOT.so
#12 0x00007f25b2f48b72 in PyROOT::TMethodHolder<PyROOT::TScopeAdapter, PyROOT::TMemberAdapter>::operator()(PyROOT::ObjectProxy*, _object*, _object*, long) () from /home/jadidi/Downloads/root/lib/libPyROOT.so
#13 0x00007f25b2f4e616 in PyROOT::(anonymous namespace)::mp_call(PyROOT::MethodProxy*, _object*, _object*) () from /home/jadidi/Downloads/root/lib/libPyROOT.so
#14 0x0000000000460462 in PyObject_CallFunctionObjArgs ()
#15 0x00000000004dd34b in ?? ()
#16 0x000000000048e36a in ?? ()
#17 0x000000000047d159 in PyObject_HasAttr ()
#18 0x00007f25b2f4ff54 in PyROOT::(anonymous namespace)::op_repr(PyROOT::ObjectProxy*) () from /home/jadidi/Downloads/root/lib/libPyROOT.so
#19 0x000000000047cb0c in _PyObject_Str ()
#20 0x000000000047cbea in PyObject_Str ()
#21 0x00000000004d644c in ?? ()
#22 0x00000000004cfe81 in PyFile_WriteObject ()
#23 0x000000000049bb2e in PyEval_EvalFrameEx ()
#24 0x000000000049fd55 in PyEval_EvalCodeEx ()
#25 0x00000000004c7608 in ?? ()
#26 0x000000000045fed4 in PyObject_Call ()
#27 0x000000000049af25 in PyEval_EvalFrameEx ()
#28 0x000000000049fd55 in PyEval_EvalCodeEx ()
#29 0x00000000004c7608 in ?? ()
#30 0x000000000045fed4 in PyObject_Call ()
#31 0x000000000049af25 in PyEval_EvalFrameEx ()
#32 0x000000000049982f in PyEval_EvalFrameEx ()
#33 0x000000000049982f in PyEval_EvalFrameEx ()
#34 0x000000000049fd55 in PyEval_EvalCodeEx ()
#35 0x00000000004c7436 in ?? ()
#36 0x000000000045fed4 in PyObject_Call ()
#37 0x0000000000461b0f in ?? ()
#38 0x000000000045fed4 in PyObject_Call ()
#39 0x000000000048dc2f in ?? ()
#40 0x000000000048bf6d in ?? ()
#41 0x000000000045fed4 in PyObject_Call ()
#42 0x00000000004996be in PyEval_EvalFrameEx ()
#43 0x000000000049982f in PyEval_EvalFrameEx ()
#44 0x000000000049982f in PyEval_EvalFrameEx ()
#45 0x000000000049fd55 in PyEval_EvalCodeEx ()
#46 0x00000000004c7436 in ?? ()
#47 0x000000000045fed4 in PyObject_Call ()
#48 0x0000000000461b0f in ?? ()
#49 0x000000000045fed4 in PyObject_Call ()
#50 0x000000000048dc2f in ?? ()
#51 0x000000000048bf6d in ?? ()
#52 0x000000000045fed4 in PyObject_Call ()
#53 0x00000000004996be in PyEval_EvalFrameEx ()
#54 0x000000000049fd55 in PyEval_EvalCodeEx ()
#55 0x0000000000499502 in PyEval_EvalFrameEx ()
#56 0x000000000049fd55 in PyEval_EvalCodeEx ()
#57 0x0000000000499502 in PyEval_EvalFrameEx ()
#58 0x000000000049fd55 in PyEval_EvalCodeEx ()
#59 0x00000000004eeb92 in PyEval_EvalCode ()
#60 0x000000000049c4d5 in PyEval_EvalFrameEx ()
#61 0x000000000049fd55 in PyEval_EvalCodeEx ()
#62 0x0000000000499502 in PyEval_EvalFrameEx ()
#63 0x000000000049fd55 in PyEval_EvalCodeEx ()
#64 0x0000000000499502 in PyEval_EvalFrameEx ()
#65 0x000000000049fd55 in PyEval_EvalCodeEx ()
#66 0x0000000000499502 in PyEval_EvalFrameEx ()
#67 0x000000000049fd55 in PyEval_EvalCodeEx ()
#68 0x0000000000499502 in PyEval_EvalFrameEx ()
#69 0x000000000049fd55 in PyEval_EvalCodeEx ()
#70 0x0000000000499502 in PyEval_EvalFrameEx ()
#71 0x000000000049fd55 in PyEval_EvalCodeEx ()
#72 0x0000000000499502 in PyEval_EvalFrameEx ()
#73 0x000000000049fd55 in PyEval_EvalCodeEx ()
#74 0x0000000000499502 in PyEval_EvalFrameEx ()
#75 0x000000000049fd55 in PyEval_EvalCodeEx ()
#76 0x00000000004eeb92 in PyEval_EvalCode ()
#77 0x00000000004ff7f4 in ?? ()
#78 0x000000000042cdd0 in PyRun_FileExFlags ()
#79 0x000000000042d798 in PyRun_SimpleFileExFlags ()
#80 0x0000000000418db5 in Py_Main ()
#81 0x00007f25bcdb7eff in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6
#82 0x00000000004c91b1 in _start ()

Thanks for any help
Mohsen

Mohsen,

python multiprocessing and C++ don’t mix. Separate from that, disk access is going to be a concern, and if you just spawn some processes, there’s no guarantee that this will be done efficiently. Recommend to use PROOF.

That said, your idea of opening a file in each process is not “very silly”: a file has a lot of state that can not be shared (both TFile specific and OS specific). If you open a file in the mother process, then pass the file to the children by means of sharing through fork or pickling, you’ll have to do a lot of reinitialization to force unsharing before you can use it. Now that, “seems very silly.” :slight_smile:

Cheers,
Wim

is it possible to use proof with python ? if yes , can you provide me one example or point me to a tutorial ?

Thanks

Mohsen,

there’s TPySelector, which should do the trick. There’s a brief example in its documentation, and searching the board should give a few more small examples.

(Not a PROOF expert myself … lack of time still to learn.)

Cheers,
Wim

I have tried in the past to use python’s multiprocessing module with PyROOT. I have had success and failure in this. Unfortunately it was long enough ago that I don’t remember which ones worked. You can see some of my code that attempted to use the module with PyROOT here:

bazaar.launchpad.net/~jfcaron/+j … uration.py

These two go together, IIRC they worked.
bazaar.launchpad.net/~jfcaron/+j … _script.py
bazaar.launchpad.net/~jfcaron/+j … umerics.py

I have also attached an even-older script that tried the combination.

Good luck! In the end I always made some dumber parallel processing implementation, like just running the program X times on X different data files.
May2012KPiSep.py (14.7 KB)

Thanks wlan, I’ve already read this webpage but couldn’t gain much. It’s a pity there is not a better documentation for this module.

jfcaron Thanks a lot for codes and advice. I am also doing the same thing, each worker open its own Tfile object and read the tree.

[quote=“llvll0hsen”]It’s a pity there is not a better documentation for this module.[/quote]Yes, as always, that’s due to a total lack of time. Please write up your experiences as you develop your code! The hope in writing TPySelector was that it’s close enough to the C++ edition that the normal PROOF documentation, with this add-on piece specific to TPySelector, would do the trick.

Cheers,
Wim