Hi
i’m trying to adapt an analysis skeleton we add using make class, to the same thing but using TSelector to run it with proof. the thing seem to run normally but between the terminate method and the constructor it craches (a segmentation violation is displayed on the remote machine i’m running on).
At the end it seems that in the output file, i have plot filled but nit with the correct number of events (there is always one or several files missing as if last file on very workers was not written)
i’ve try to run with valgrind, but apart from indirectly lost (corrected now) I didn’t find anything suspicious in the log (normal or valgrind’s one)
here is an example of stack trace at the end of the job on the remote machine
===========================================================
There was a crash (kSigSegmentationViolation).
This is the entire stack trace of all threads:
===========================================================
#0 0x00002b4551bba115 in waitpid () from /lib64/libc.so.6
#1 0x00002b4551b5c481 in do_system () from /lib64/libc.so.6
#2 0x00002b4549123e69 in TUnixSystem::Exec (this=0x1175d3a0,
shellcmd=0x13bb8628 "/afs/cern.ch/sw/lcg/app/releases/ROOT/5.28.00g/x86_64-slc5-gcc43-dbg/root/etc/gdb-backtrace.sh 31370 1>&2")
at /build/bellenot/SPI/x86_64-slc5-gcc43-dbg/root/core/unix/src/TUnixSystem.cxx:2031
#3 0x00002b4549123022 in TUnixSystem::StackTrace (this=0x1175d3a0)
at /build/bellenot/SPI/x86_64-slc5-gcc43-dbg/root/core/unix/src/TUnixSystem.cxx:2253
#4 0x00002b454912655e in TUnixSystem::DispatchSignals (this=0x1175d3a0,
sig=kSigSegmentationViolation)
at /build/bellenot/SPI/x86_64-slc5-gcc43-dbg/root/core/unix/src/TUnixSystem.cxx:1157
#5 0x00002b4549126688 in SigHandler (sig=kSigSegmentationViolation)
at /build/bellenot/SPI/x86_64-slc5-gcc43-dbg/root/core/unix/src/TUnixSystem.cxx:357
#6 0x00002b454911b79c in sighandler (sig=11)
at /build/bellenot/SPI/x86_64-slc5-gcc43-dbg/root/core/unix/src/TUnixSystem.cxx:3521
#7 <signal handler called>
#8 0x00002b4548b1908e in AnalysisSkel::Terminate() ()
from /afs/cern.ch/user/j/jblancha/JBExample/lib/libJBB.so.0.0
#9 0x00002b455574c1bb in TProofPlayerLite::Finalize (this=0x13534190,
force=false, sync=true)
at /build/bellenot/SPI/x86_64-slc5-gcc43-dbg/root/proof/proofplayer/src/TProofPlayerLite.cxx:291
#10 0x00002b455574d4ac in TProofPlayerLite::Process (this=0x13534190, dset=
0x7fffb470d660, selector_file=0x135424b8 "MyAnalysis/AnalysisSkel.C++",
option=0x40a2a4 "", nentries=-1, first=0)
at /build/bellenot/SPI/x86_64-slc5-gcc43-dbg/root/proof/proofplayer/src/TProofPlayerLite.cxx:230
#11 0x00002b4550df4636 in TProofLite::Process (this=0x134d62f0,
dset=0x7fffb470d660, selector=0x40a32f "MyAnalysis/AnalysisSkel.C++",
option=0x40a2a4 "", nentries=-1, first=0)
at /build/bellenot/SPI/x86_64-slc5-gcc43-dbg/root/proof/proof/src/TProofLite.cxx:1123
#12 0x0000000000406b51 in main ()
===========================================================
to be more specific i look at log of every worker (4 for 20000 evts for this test) and there is always the same stack trace on workers but now after destructor (i’ve put a lot of cout to see where it crashed)
===========================================================
There was a crash (kSigSegmentationViolation).
This is the entire stack trace of all threads:
===========================================================
#0 0x00002ac237f3f115 in waitpid () from /lib64/libc.so.6
#1 0x00002ac237ee1481 in do_system () from /lib64/libc.so.6
#2 0x00002ac235b65e69 in TUnixSystem::Exec (this=0x912a3a0,
shellcmd=0xb4390a8 "/afs/cern.ch/sw/lcg/app/releases/ROOT/5.28.00g/x86_64-slc5-gcc43-dbg/root/etc/gdb-backtrace.sh 14637 1>&2")
at /build/bellenot/SPI/x86_64-slc5-gcc43-dbg/root/core/unix/src/TUnixSystem.cxx:2031
#3 0x00002ac235b65022 in TUnixSystem::StackTrace (this=0x912a3a0)
at /build/bellenot/SPI/x86_64-slc5-gcc43-dbg/root/core/unix/src/TUnixSystem.cxx:2253
#4 0x00002ac235b6855e in TUnixSystem::DispatchSignals (this=0x912a3a0,
sig=kSigSegmentationViolation)
at /build/bellenot/SPI/x86_64-slc5-gcc43-dbg/root/core/unix/src/TUnixSystem.cxx:1157
#5 0x00002ac235b68688 in SigHandler (sig=kSigSegmentationViolation)
at /build/bellenot/SPI/x86_64-slc5-gcc43-dbg/root/core/unix/src/TUnixSystem.cxx:357
#6 0x00002ac235b5d79c in sighandler (sig=11)
at /build/bellenot/SPI/x86_64-slc5-gcc43-dbg/root/core/unix/src/TUnixSystem.cxx:3521
#7 <signal handler called>
#8 0x0000000000000051 in ?? ()
#9 0x00002ac235af6ec7 in TList::Delete (this=0x9c18500,
option=0x2ac235f54f70 "")
at /build/bellenot/SPI/x86_64-slc5-gcc43-dbg/root/core/cont/src/TList.cxx:414
#10 0x00002ac235af61c1 in TList::Clear (this=0x9c18500,
option=0x2ac235f54f70 "")
at /build/bellenot/SPI/x86_64-slc5-gcc43-dbg/root/core/cont/src/TList.cxx:350
#11 0x00002ac235af7075 in TList::~TList (this=0x9c18500,
__in_chrg=<value optimized out>)
at /build/bellenot/SPI/x86_64-slc5-gcc43-dbg/root/core/cont/src/TList.cxx:83
#12 0x00002ac239703bbd in TSelectorList::~TSelectorList (this=0x9c18500,
__in_chrg=<value optimized out>) at include/TSelectorList.h:33
#13 0x00002ac239702f12 in TSelector::~TSelector (this=0x9d28130,
__in_chrg=<value optimized out>)
at /build/bellenot/SPI/x86_64-slc5-gcc43-dbg/root/tree/tree/src/TSelector.cxx:98
#14 0x00002ac23a432494 in AnalysisSkel::~AnalysisSkel() ()
from /afs/cern.ch/user/j/jblancha/JBExample/lib/libJBB.so
#15 0x00002ac240159140 in TProofPlayer::~TProofPlayer (this=0x9c14c70,
__in_chrg=<value optimized out>)
at /build/bellenot/SPI/x86_64-slc5-gcc43-dbg/root/proof/proofplayer/src/TProofPlayer.cxx:226
#16 0x00002ac240170fdd in TProofPlayerSlave::~TProofPlayerSlave (
this=0x9c14c70, __in_chrg=<value optimized out>)
at include/TProofPlayer.h:337
#17 0x00002ac239b7f33a in TProofServ::DeletePlayer (this=0x96db480)
at /build/bellenot/SPI/x86_64-slc5-gcc43-dbg/root/proof/proof/src/TProofServ.cxx:6003
#18 0x00002ac239b8b844 in TProofServ::HandleProcess (this=0x96db480, mess=
0x96f48f0, slb=0x0)
at /build/bellenot/SPI/x86_64-slc5-gcc43-dbg/root/proof/proof/src/TProofServ.cxx:3825
#19 0x00002ac239ba1105 in TProofServ::HandleSocketInput (this=0x96db480,
mess=0x96f48f0, all=true)
at /build/bellenot/SPI/x86_64-slc5-gcc43-dbg/root/proof/proof/src/TProofServ.cxx:1595
#20 0x00002ac239b92b49 in TProofServ::HandleSocketInput (this=0x96db480)
at /build/bellenot/SPI/x86_64-slc5-gcc43-dbg/root/proof/proof/src/TProofServ.cxx:1328
#21 0x00002ac239baae5b in TProofServLiteInputHandler::Notify (this=0x96dcfc0)
at /build/bellenot/SPI/x86_64-slc5-gcc43-dbg/root/proof/proof/src/TProofServLite.cxx:162
#22 0x00002ac239badfa0 in TProofServLiteInputHandler::ReadNotify (
this=0x96dcfc0)
at /build/bellenot/SPI/x86_64-slc5-gcc43-dbg/root/proof/proof/src/TProofServLite.cxx:154
#23 0x00002ac235b678e5 in TUnixSystem::CheckDescriptors (this=0x912a3a0)
at /build/bellenot/SPI/x86_64-slc5-gcc43-dbg/root/core/unix/src/TUnixSystem.cxx:1259
#24 0x00002ac235b68057 in TUnixSystem::DispatchOneEvent (this=0x912a3a0,
pendingOnly=false)
at /build/bellenot/SPI/x86_64-slc5-gcc43-dbg/root/core/unix/src/TUnixSystem.cxx:966
#25 0x00002ac235aaf89a in TSystem::InnerLoop (this=0x912a3a0)
at /build/bellenot/SPI/x86_64-slc5-gcc43-dbg/root/core/base/src/TSystem.cxx:406
#26 0x00002ac235abf2b0 in TSystem::Run (this=0x912a3a0)
at /build/bellenot/SPI/x86_64-slc5-gcc43-dbg/root/core/base/src/TSystem.cxx:356
#27 0x00002ac235a2e73f in TApplication::Run (this=0x96db480, retrn=false)
at /build/bellenot/SPI/x86_64-slc5-gcc43-dbg/root/core/base/src/TApplication.cxx:1052
#28 0x00002ac239b904ac in TProofServ::Run (this=0x96db480, retrn=false)
at /build/bellenot/SPI/x86_64-slc5-gcc43-dbg/root/proof/proof/src/TProofServ.cxx:2472
#29 0x0000000000402348 in main (argc=5, argv=0x7fffc6094568)
at /build/bellenot/SPI/x86_64-slc5-gcc43-dbg/root/main/src/pmain.cxx:314
===========================================================
The lines below might hint at the cause of the crash.
If they do not help you then please submit a bug report at
http://root.cern.ch/bugs. Please post the ENTIRE stack trace
from above as an attachment in addition to anything else
that might help us fixing this issue.
===========================================================
#8 0x0000000000000051 in ?? ()
#9 0x00002ac235af6ec7 in TList::Delete (this=0x9c18500,
option=0x2ac235f54f70 "")
at /build/bellenot/SPI/x86_64-slc5-gcc43-dbg/root/core/cont/src/TList.cxx:414
#10 0x00002ac235af61c1 in TList::Clear (this=0x9c18500,
option=0x2ac235f54f70 "")
at /build/bellenot/SPI/x86_64-slc5-gcc43-dbg/root/core/cont/src/TList.cxx:350
#11 0x00002ac235af7075 in TList::~TList (this=0x9c18500,
__in_chrg=<value optimized out>)
at /build/bellenot/SPI/x86_64-slc5-gcc43-dbg/root/core/cont/src/TList.cxx:83
#12 0x00002ac239703bbd in TSelectorList::~TSelectorList (this=0x9c18500,
__in_chrg=<value optimized out>) at include/TSelectorList.h:33
#13 0x00002ac239702f12 in TSelector::~TSelector (this=0x9d28130,
__in_chrg=<value optimized out>)
at /build/bellenot/SPI/x86_64-slc5-gcc43-dbg/root/tree/tree/src/TSelector.cxx:98
#14 0x00002ac23a432494 in AnalysisSkel::~AnalysisSkel() ()
from /afs/cern.ch/user/j/jblancha/JBExample/lib/libJBB.so
#15 0x00002ac240159140 in TProofPlayer::~TProofPlayer (this=0x9c14c70,
__in_chrg=<value optimized out>)
at /build/bellenot/SPI/x86_64-slc5-gcc43-dbg/root/proof/proofplayer/src/TProofPlayer.cxx:226
#16 0x00002ac240170fdd in TProofPlayerSlave::~TProofPlayerSlave (
this=0x9c14c70, __in_chrg=<value optimized out>)
at include/TProofPlayer.h:337
#17 0x00002ac239b7f33a in TProofServ::DeletePlayer (this=0x96db480)
at /build/bellenot/SPI/x86_64-slc5-gcc43-dbg/root/proof/proof/src/TProofServ.cxx:6003
#18 0x00002ac239b8b844 in TProofServ::HandleProcess (this=0x96db480, mess=
0x96f48f0, slb=0x0)
at /build/bellenot/SPI/x86_64-slc5-gcc43-dbg/root/proof/proof/src/TProofServ.cxx:3825
#19 0x00002ac239ba1105 in TProofServ::HandleSocketInput (this=0x96db480,
mess=0x96f48f0, all=true)
at /build/bellenot/SPI/x86_64-slc5-gcc43-dbg/root/proof/proof/src/TProofServ.cxx:1595
#20 0x00002ac239b92b49 in TProofServ::HandleSocketInput (this=0x96db480)
at /build/bellenot/SPI/x86_64-slc5-gcc43-dbg/root/proof/proof/src/TProofServ.cxx:1328
#21 0x00002ac239baae5b in TProofServLiteInputHandler::Notify (this=0x96dcfc0)
at /build/bellenot/SPI/x86_64-slc5-gcc43-dbg/root/proof/proof/src/TProofServLite.cxx:162
#22 0x00002ac239badfa0 in TProofServLiteInputHandler::ReadNotify (
this=0x96dcfc0)
at /build/bellenot/SPI/x86_64-slc5-gcc43-dbg/root/proof/proof/src/TProofServLite.cxx:154
#23 0x00002ac235b678e5 in TUnixSystem::CheckDescriptors (this=0x912a3a0)
at /build/bellenot/SPI/x86_64-slc5-gcc43-dbg/root/core/unix/src/TUnixSystem.cxx:1259
#24 0x00002ac235b68057 in TUnixSystem::DispatchOneEvent (this=0x912a3a0,
pendingOnly=false)
at /build/bellenot/SPI/x86_64-slc5-gcc43-dbg/root/core/unix/src/TUnixSystem.cxx:966
#25 0x00002ac235aaf89a in TSystem::InnerLoop (this=0x912a3a0)
at /build/bellenot/SPI/x86_64-slc5-gcc43-dbg/root/core/base/src/TSystem.cxx:406
#26 0x00002ac235abf2b0 in TSystem::Run (this=0x912a3a0)
at /build/bellenot/SPI/x86_64-slc5-gcc43-dbg/root/core/base/src/TSystem.cxx:356
#27 0x00002ac235a2e73f in TApplication::Run (this=0x96db480, retrn=false)
at /build/bellenot/SPI/x86_64-slc5-gcc43-dbg/root/core/base/src/TApplication.cxx:1052
#28 0x00002ac239b904ac in TProofServ::Run (this=0x96db480, retrn=false)
at /build/bellenot/SPI/x86_64-slc5-gcc43-dbg/root/proof/proof/src/TProofServ.cxx:2472
#29 0x0000000000402348 in main (argc=5, argv=0x7fffc6094568)
at /build/bellenot/SPI/x86_64-slc5-gcc43-dbg/root/main/src/pmain.cxx:314
===========================================================
followed by an error i don’t understand ;
on node 0 and 2
18:27:48 14614 Wrk-0.0 | Error in <TProofServLite::HandleException>: caugth exception triggered by signal '1' <undef>
on node 1
18:27:42 14628 Wrk-0.1 | Error in <TProofServLite::HandleException>: caugth exception triggered by signal '1' while processing dset:'TDSet:physics', file:'/tmp/jblancha/Test/NTUP_SMWZ.591363._000162.root', event:8213 - check logs for possible stacktrace
on node 3
18:27:42 14637 Wrk-0.3 | Error in <TProofServLite::HandleException>: caugth exception triggered by signal '1' while processing dset:'TDSet:physics', file:'/tmp/jblancha/Test/NTUP_SMWZ.591363._000162.root', event:9999 - check logs for possible stacktrace
ideas are more than welcome
many thanks
jb