Dear all,
I am getting a crash after SlaveBegin() has been called successfully (it seems) and before Notify() is called while in the worker nodes. Here is the dump of the crash in the workers:
This working path /afs/cern.ch/user/f/fullana/.proof/packages/D3PD_CALJET_ANALYSIS/
run = 179771
bin = 1 >> trig = EF_mbMbts_1_eff
bin = 2 >> trig = EF_j10_a4tc_EFFS
bin = 3 >> trig = EF_j15_a4tc_EFFS
bin = 4 >> trig = EF_j20_a4tc_EFFS
bin = 5 >> trig = EF_j30_a4tc_EFFS
bin = 6 >> trig = EF_j40_a4tc_EFFS
bin = 7 >> trig = EF_j55_a4tc_EFFS
bin = 8 >> trig = EF_j75_a4tc_EFFS
bin = 9 >> trig = EF_j100_a4tc_EFFS
bin = 10 >> trig = EF_j135_a4tc_EFFS
bin = 11 >> trig = EF_j180_a4tc_EFFS
virtual void xsAnalysis::SlaveBegin(TTree*) done
15:38:34 22900 Wrk-0.0 | *** Break ***: segmentation violation
===========================================================
There was a crash.
This is the entire stack trace of all threads:
#0 0x0000003ecd49a115 in waitpid () from /lib64/libc.so.6
#1 0x0000003ecd43c481 in do_system () from /lib64/libc.so.6
#2 0x00002ab6dea55eea in TUnixSystem::StackTrace() ()
from /afs/cern.ch/sw/lcg/app/releases/ROOT/5.28.00b/x86_64-slc5-gcc43-opt/root/lib/libCore.so
#3 0x00002ab6dea558bc in TUnixSystem::DispatchSignals(ESignals) ()
from /afs/cern.ch/sw/lcg/app/releases/ROOT/5.28.00b/x86_64-slc5-gcc43-opt/root/lib/libCore.so
#4
#5 0x00002ab6e319d72c in TEventIterTree::GetTrees(TDSetElement*) ()
from /afs/cern.ch/sw/lcg/app/releases/ROOT/5.28.00b/x86_64-slc5-gcc43-opt/root/lib/libProofPlayer.so
#6 0x00002ab6e319de93 in TEventIterTree::GetNextEvent() ()
from /afs/cern.ch/sw/lcg/app/releases/ROOT/5.28.00b/x86_64-slc5-gcc43-opt/root/lib/libProofPlayer.so
#7 0x00002ab6e31ceb05 in TProofPlayer::Process(TDSet*, char const*, char const*, long long, long long) ()
from /afs/cern.ch/sw/lcg/app/releases/ROOT/5.28.00b/x86_64-slc5-gcc43-opt/root/lib/libProofPlayer.so
#8 0x00002ab6e1334298 in TProofServ::HandleProcess(TMessage*, TString*) ()
from /afs/cern.ch/sw/lcg/app/releases/ROOT/5.28.00b/x86_64-slc5-gcc43-opt/root/lib/libProof.so
#9 0x00002ab6e1339b24 in TProofServ::HandleSocketInput(TMessage*, bool) ()
from /afs/cern.ch/sw/lcg/app/releases/ROOT/5.28.00b/x86_64-slc5-gcc43-opt/root/lib/libProof.so
#10 0x00002ab6e132cb31 in TProofServ::HandleSocketInput() ()
from /afs/cern.ch/sw/lcg/app/releases/ROOT/5.28.00b/x86_64-slc5-gcc43-opt/root/lib/libProof.so
#11 0x00002ab6e1343e61 in TProofServLiteInputHandler::Notify() ()
from /afs/cern.ch/sw/lcg/app/releases/ROOT/5.28.00b/x86_64-slc5-gcc43-opt/root/lib/libProof.so
#12 0x00002ab6dea53cb9 in TUnixSystem::CheckDescriptors() ()
from /afs/cern.ch/sw/lcg/app/releases/ROOT/5.28.00b/x86_64-slc5-gcc43-opt/root/lib/libCore.so
#13 0x00002ab6dea542e0 in TUnixSystem::DispatchOneEvent(bool) ()
from /afs/cern.ch/sw/lcg/app/releases/ROOT/5.28.00b/x86_64-slc5-gcc43-opt/root/lib/libCore.so
#14 0x00002ab6de9cff66 in TSystem::InnerLoop() ()
from /afs/cern.ch/sw/lcg/app/releases/ROOT/5.28.00b/x86_64-slc5-gcc43-opt/root/lib/libCore.so
#15 0x00002ab6de9d203c in TSystem::Run() ()
from /afs/cern.ch/sw/lcg/app/releases/ROOT/5.28.00b/x86_64-slc5-gcc43-opt/root/lib/libCore.so
#16 0x00002ab6de96da0f in TApplication::Run(bool) ()
from /afs/cern.ch/sw/lcg/app/releases/ROOT/5.28.00b/x86_64-slc5-gcc43-opt/root/lib/libCore.so
#17 0x0000000000401b30 in main ()
The lines below might hint at the cause of the crash.
If they do not help you then please submit a bug report at
root.cern.ch/bugs . Please post the ENTIRE stack trace
from above as an attachment in addition to anything else
that might help us fixing this issue.
#5 0x00002ab6e319d72c in TEventIterTree::GetTrees(TDSetElement*) ()
from /afs/cern.ch/sw/lcg/app/releases/ROOT/5.28.00b/x86_64-slc5-gcc43-opt/root/lib/libProofPlayer.so
#6 0x00002ab6e319de93 in TEventIterTree::GetNextEvent() ()
from /afs/cern.ch/sw/lcg/app/releases/ROOT/5.28.00b/x86_64-slc5-gcc43-opt/root/lib/libProofPlayer.so
#7 0x00002ab6e31ceb05 in TProofPlayer::Process(TDSet*, char const*, char const*, long long, long long) ()
from /afs/cern.ch/sw/lcg/app/releases/ROOT/5.28.00b/x86_64-slc5-gcc43-opt/root/lib/libProofPlayer.so
#8 0x00002ab6e1334298 in TProofServ::HandleProcess(TMessage*, TString*) ()
from /afs/cern.ch/sw/lcg/app/releases/ROOT/5.28.00b/x86_64-slc5-gcc43-opt/root/lib/libProof.so
#9 0x00002ab6e1339b24 in TProofServ::HandleSocketInput(TMessage*, bool) ()
from /afs/cern.ch/sw/lcg/app/releases/ROOT/5.28.00b/x86_64-slc5-gcc43-opt/root/lib/libProof.so
#10 0x00002ab6e132cb31 in TProofServ::HandleSocketInput() ()
from /afs/cern.ch/sw/lcg/app/releases/ROOT/5.28.00b/x86_64-slc5-gcc43-opt/root/lib/libProof.so
#11 0x00002ab6e1343e61 in TProofServLiteInputHandler::Notify() ()
from /afs/cern.ch/sw/lcg/app/releases/ROOT/5.28.00b/x86_64-slc5-gcc43-opt/root/lib/libProof.so
#12 0x00002ab6dea53cb9 in TUnixSystem::CheckDescriptors() ()
from /afs/cern.ch/sw/lcg/app/releases/ROOT/5.28.00b/x86_64-slc5-gcc43-opt/root/lib/libCore.so
#13 0x00002ab6dea542e0 in TUnixSystem::DispatchOneEvent(bool) ()
from /afs/cern.ch/sw/lcg/app/releases/ROOT/5.28.00b/x86_64-slc5-gcc43-opt/root/lib/libCore.so
#14 0x00002ab6de9cff66 in TSystem::InnerLoop() ()
from /afs/cern.ch/sw/lcg/app/releases/ROOT/5.28.00b/x86_64-slc5-gcc43-opt/root/lib/libCore.so
#15 0x00002ab6de9d203c in TSystem::Run() ()
from /afs/cern.ch/sw/lcg/app/releases/ROOT/5.28.00b/x86_64-slc5-gcc43-opt/root/lib/libCore.so
#16 0x00002ab6de96da0f in TApplication::Run(bool) ()
from /afs/cern.ch/sw/lcg/app/releases/ROOT/5.28.00b/x86_64-slc5-gcc43-opt/root/lib/libCore.so
#17 0x0000000000401b30 in main ()
15:38:35 22900 Wrk-0.0 | Error in TProofServLite::HandleException : caugth exception triggered by signal ‘1’
I am attaching all the logs in .proof just in case they are needed. The program works fine in non-proof mode and the crash seems to happen between proof calls so I am a bit lost in how to debug it. Any hint on that or what could be going on it would be highly appreciated.
Best regards
Esteban
dotprooflogs.tar.gz (2.68 KB)
ganis
July 28, 2011, 1:55pm
2
Hi,
Is this with ATLAS files?
It may be the same problem discussed in
/viewtopic.php?f=13&t=12835
G. Ganis
Thanks a lot Ganis,
It was exactly that and I confirm that both options work: adding the line or moving to 5.30. Sorry to bother you for something already solved.
Best regards
Esteban