ProofLite randomly segfaults on version 5.26

Hi,

I’m having problems with “random crashes” with PROOF (similar to my previous ones posted). It seems like one of the workers (out of 2) crashes with a segfault.

This time I can see the stack trace in the log file, and it seems one of the workers is crashing on the “GetEvent” after analysing part of the data, while the second one continues without a problem.

Any idea what the problem could be? Is there anyway to directly debug my Selector? it seems that the moment I run Chain->Process("MySelector++g")
I cannot debug any more.

Attached is the stack output as well as my TSelector.

Thanks!!
Nati.

07:01:01 19335 Wrk-0.0 | Info in <TEventIterTree::GetTrees>: the tree cache is in learning phase
07:01:01 19335 Wrk-0.0 | Info in <TProofServLite::RestartComputeTime>: compute time restarted after 0.309261 secs (100 entries)
07:03:22 19335 Wrk-0.0 | *** Break ***: segmentation violation
===========================================================
There was a crash (kSigSegmentationViolation).
This is the entire stack trace of all threads:
===========================================================
stat_loc=0x7fffa906055c, options=<value optimized out>)
at ../sysdeps/unix/sysv/linux/waitpid.c:32
#0  0x00007f450a8c2a8e in __libc_waitpid (pid=<value optimized out>, 
stat_loc=0x7fffa906055c, options=<value optimized out>)
at ../sysdeps/unix/sysv/linux/waitpid.c:32
#1  0x00007f450a8601f9 in do_system (line=<value optimized out>)
at ../sysdeps/posix/system.c:149
#2  0x00007f450c91870a in TUnixSystem::Exec (this=0x1d110b0, 
shellcmd=0x286b9e8 "/media/data1/School/Thesis/root-5.26/etc/gdb-backtrace.sh 19335 1>&2") at core/unix/src/TUnixSystem.cxx:1978
#3  0x00007f450c919002 in TUnixSystem::StackTrace (this=0x1d110b0)
at core/unix/src/TUnixSystem.cxx:2188
#4  0x00007f450c916854 in TUnixSystem::DispatchSignals (this=0x1d110b0, 
sig=kSigSegmentationViolation) at core/unix/src/TUnixSystem.cxx:1106
#5  0x00007f450c914529 in SigHandler (sig=kSigSegmentationViolation)
at core/unix/src/TUnixSystem.cxx:350
#6  0x00007f450c91c48e in sighandler (sig=11)
at core/unix/src/TUnixSystem.cxx:3428
#7  <signal handler called>
#8  0x00007f4508a89a2f in G__G__Tree_110_0_32(G__value*, char const*, G__param*, int) () from /media/data1/School/Thesis/root-5.26//lib/libTree.so
#9  0x00007f450bc524e4 in Cint::G__ExceptionWrapper (
funcp=0x7f4508a89908 <G__G__Tree_110_0_32(G__value*, char const*, G__param*, int)>, result7=0x7fffa906e6f0, 
funcname=0xffffffffffffffff <Address 0xffffffffffffffff out of bounds>, 
libp=0x7fffa9063370, hash=-105) at cint/cint/src/Api.cxx:385
#10 0x00007f450bc6a89c in G__exec_asm (start=0, stack=1, presult=0x27e1180, 
localmem=42481408) at cint/cint/src/bc_exec_asm.h:641
#11 0x00007f450bc67b38 in G__exec_bytecode (result7=0x27e1180, 
funcname=0x29f9630 "", libp=0x27e1220) at cint/cint/src/bc_exec.cxx:565
#12 0x00007f450bd02fa3 in G__interpret_func (result7=0x27e1180, 
funcname=0x274c700 "Process", libp=0x27e1220, hash=735, p_ifunc=0x274c600, 
funcmatch=1, memfunc_flag=0) at cint/cint/src/ifunc.cxx:5558
#13 0x00007f450bc9287f in Cint::G__CallFunc::ExecInterpretedFunc (
this=0x27e1170, presult=0x27e1180) at cint/cint/src/CallFunc.cxx:482
#14 0x00007f450bc925e6 in Cint::G__CallFunc::Execute (this=0x27e1170, 
pobject=0x2780680) at cint/cint/src/CallFunc.cxx:445
#15 0x00007f450c909bd2 in Cint::G__CallFunc::ExecInt (this=0x27e1170, 
pobject=0x2780680) at include/CallFunc.h:96
#16 0x00007f450c906769 in TCint::CallFunc_ExecInt (this=0x1d16fa0, 
func=0x27e1170, address=0x2780680) at core/meta/src/TCint.cxx:2236
#17 0x00007f4508a334cf in TSelectorCint::Process (this=0x2770f00, entry=866306)
at tree/tree/src/TSelectorCint.cxx:283
#18 0x00007f4507710436 in TProofPlayer::Process (this=0x27056b0, 
dset=0x264de10, selector_file=0x2bfbba8 "MySelector.C", 
option=0x7f450d0631b8 "", nentries=-1, first=-1)
at proof/proofplayer/src/TProofPlayer.cxx:887
#19 0x00007f450861350f in TProofServ::HandleProcess (this=0x21f0eb0, 
mess=0x222ae80) at proof/proof/src/TProofServ.cxx:3392
#20 0x00007f4508607d97 in TProofServ::HandleSocketInput (this=0x21f0eb0, 
mess=0x222ae80, all=true) at proof/proof/src/TProofServ.cxx:1371
#21 0x00007f4508606bd2 in TProofServ::HandleSocketInput (this=0x21f0eb0)
at proof/proof/src/TProofServ.cxx:1150
#22 0x00007f4508626007 in TProofServLiteInputHandler::Notify (this=0x21f1600)
at proof/proof/src/TProofServLite.cxx:162
#23 0x00007f4508628e1b in TProofServLiteInputHandler::ReadNotify (
this=0x21f1600) at proof/proof/src/TProofServLite.cxx:154
#24 0x00007f450c916bc8 in TUnixSystem::CheckDescriptors (this=0x1d110b0)
at core/unix/src/TUnixSystem.cxx:1208
#25 0x00007f450c915e4a in TUnixSystem::DispatchOneEvent (this=0x1d110b0, 
pendingOnly=false) at core/unix/src/TUnixSystem.cxx:915
#26 0x00007f450c86e4bf in TSystem::InnerLoop (this=0x1d110b0)
at core/base/src/TSystem.cxx:393
#27 0x00007f450c86e242 in TSystem::Run (this=0x1d110b0)
at core/base/src/TSystem.cxx:343
#28 0x00007f450c7f63a0 in TApplication::Run (this=0x21f0eb0, retrn=false)
at core/base/src/TApplication.cxx:993
#29 0x00007f450860c27b in TProofServ::Run (this=0x21f0eb0, retrn=false)
at proof/proof/src/TProofServ.cxx:2230
#30 0x00000000004025c1 in main (argc=5, argv=0x7fffa9074f88)
at main/src/pmain.cxx:314
===========================================================
The crash is most likely caused by a problem in your script.
Try to compile it (.L myscript.C+g) and fix any errors.
If that does not help then please submit a bug report at
http://root.cern.ch/bugs. Please post the ENTIRE stack trace
from above as an attachment in addition to anything else
that might help us fixing this issue.
07:03:24 19335 Wrk-0.0 | Error in <TProofServLite::HandleException>: exception triggered by signal: 1
// --------- End of element log -------------------

MySelector.h (5.43 KB)
MySelector.C (5.6 KB)

OK, seem to have partially solved the problem.

For further reference, the problem was that the data structure in some of the root files in the chain was different from the one defined in the TSelector header. Running “->GetEvent()” returned an array larger then what I had allocated for in the TSelector header file - therefore the segfault.

I’m still searching for a way to debug the TSelector being processed using the “Process()” command, It would really help finding errors like these.

Any help would be greatly appreciated,

Thanks again for an amazing piece of software,

Nati.

Dear Nati,

Sorry for the late reaction due to the Xmas break.

We are aware that debugging in PROOF may be tricky. Typically, the first thing to understand is the problem is specific to PROOF or not.
In your case you would have probably experienced the same problem working in a standard ROOT session with a TChain, which may be easier to debug.
Another possibility, at least on Linux, is run inside valgrind:

TProof::Open("","valgrind=workers")

The logs are available in the usual log dialog box and may give you information about the problem; have a look at the ‘Valgrinding the workers sessions’ under root.cern.ch/drupal/content/runn … y-valgrind.

G. Ganis