Please read tips for efficient and successful posting and posting code
ROOT Version: 6.22
Platform: lxplus
Compiler: Not Provided
Dear experts,
I am experiencing a crash using TProof (Lite) with a compiled TSelector running on lxplus (root version 6.22)
due to a break segmentation violation that seems random in the sense that it may or may not occur.
When it occurs it is always the same error it occurs after having processed successfully all events and it should then terminate slave nodes and start merging. It occurs after having called the destructor of my the TSelector class (called Analyser
) the destructor does nothing in my case.
I would say it occurs arround 1 or 2 times out of 10 tries in average running on exactly the same files.
I use a compiled TSelector, I attach the code in this post tproof_bug_reproducer.zip (377.2 KB)
The rest of the time the TSelector/Tchain process runs fine and there is no such error
I checked the logs of slaves it only occurs on one of the node
But I have no idea of what could cause this. And I don’t think it is coming from a leak of memory so I guess there is a setting of Tproof that I am missing
If you could have a look and if you have an hint or where this problem could come from it would be great.
Many thanks in advance
I put below the log of the problematic slave node.
It only occurs in one of the slave node the other log files are fine
23:30:19 30333 Wrk-0.0 | Info in <TProofServLite::Setup>: fWorkDir: /afs/cern.ch/user/b/bouquet/work/private/VHbb_branch/VHbb_test_tprocess/output/test_dir_20210321-233016/logs
23:30:19 30333 Wrk-0.0 | Info in <TProofServLite::SetupCommon>: 0 global package directories registered
23:30:20 30333 Wrk-0.0 | Info in <TProofServLite::HandleProcess>: selector obj for 'Analyser' found
23:30:20 30333 Wrk-0.0 | Info in <TProofServLite::HandleProcess>: calling fPlayer->Process() with selector object: Analyser
23:30:20 30333 Wrk-0.0 | Info in <TProofPlayerSlave::AssertSelector>: Processing via TSelector object
23:30:20 30333 Wrk-0.0 | Info in <TEventIter::TEventIter>: fPackets list 'ProcessedPackets_0.0' created
23:30:20 30333 Wrk-0.0 | Info in <TProofPlayerSlave::Process>: save partial results? 0 per-packet? 0
Calling SlaveBegin
m_outputdir_fullpath = ../output
m_analysisLepChannel = 1
23:30:20 30333 Wrk-0.0 | SvcMsg in <TProofPlayerSlave::CheckMemUsage>: Memory 472048 virtual 142288 resident event 0
23:30:20 30333 Wrk-0.0 | Info in <TEventIterTree::GetTrees>: the tree cache is in learning phase
Initializing branches
23:30:20 30333 Wrk-0.0 | Info in <TProofServLite::RestartComputeTime>: compute time restarted after 0.025289 secs (100 entries)
23:30:24 30333 Wrk-0.0 | SvcMsg in <TProofPlayerSlave::CheckMemUsage>: Memory 569676 virtual 206232 resident event 547416
Calling SlaveTerminate
Writing histograms
23:30:24 30333 Wrk-0.0 | *** Break ***: segmentation violation
===========================================================
There was a crash.
This is the entire stack trace of all threads:
===========================================================
#0 0x00007fe0eccb14fc in waitpid () from /lib64/libc.so.6
#1 0x00007fe0ecc2efb2 in do_system () from /lib64/libc.so.6
#2 0x00007fe0edcac404 in TUnixSystem::StackTrace() () from /cvmfs/sft.cern.ch/lcg/app/releases/ROOT/6.22.08/x86_64-centos7-gcc48-opt/lib/libCore.so.6.22
#3 0x00007fe0edcae09a in TUnixSystem::DispatchSignals(ESignals) () from /cvmfs/sft.cern.ch/lcg/app/releases/ROOT/6.22.08/x86_64-centos7-gcc48-opt/lib/libCore.so.6.22
#4 <signal handler called>
#5 0x00007fe0d8a4c0b2 in TTreeReader::~TTreeReader() () from /cvmfs/sft.cern.ch/lcg/app/releases/ROOT/6.22.08/x86_64-centos7-gcc48-opt/lib/libTreePlayer.so.6.22
#6 0x00007fe0d4c497b3 in Analyser::~Analyser() () from /afs/cern.ch/work/b/bouquet/private/VHbb_branch/VHbb_test_tprocess/build/libAnalyser.so
#7 0x00007fe0d7807cf3 in TProofPlayer::~TProofPlayer() () from /cvmfs/sft.cern.ch/lcg/app/releases/ROOT/6.22.08/x86_64-centos7-gcc48-opt/lib/libProofPlayer.so.6.22
#8 0x00007fe0d7818011 in TProofPlayerSlave::~TProofPlayerSlave() () from /cvmfs/sft.cern.ch/lcg/app/releases/ROOT/6.22.08/x86_64-centos7-gcc48-opt/lib/libProofPlayer.so.6.22
#9 0x00007fe0dcae15b2 in TProofServ::DeletePlayer() () from /cvmfs/sft.cern.ch/lcg/app/releases/ROOT/6.22.08/x86_64-centos7-gcc48-opt/lib/libProof.so.6.22.08
#10 0x00007fe0dcafa068 in TProofServ::HandleSocketInput(TMessage*, bool) () from /cvmfs/sft.cern.ch/lcg/app/releases/ROOT/6.22.08/x86_64-centos7-gcc48-opt/lib/libProof.so.6.22.08
#11 0x00007fe0dcae8c8f in TProofServ::HandleSocketInput() () from /cvmfs/sft.cern.ch/lcg/app/releases/ROOT/6.22.08/x86_64-centos7-gcc48-opt/lib/libProof.so.6.22.08
#12 0x00007fe0dcafdae1 in TProofServLiteInputHandler::Notify() () from /cvmfs/sft.cern.ch/lcg/app/releases/ROOT/6.22.08/x86_64-centos7-gcc48-opt/lib/libProof.so.6.22.08
#13 0x00007fe0edcad5b5 in TUnixSystem::CheckDescriptors() () from /cvmfs/sft.cern.ch/lcg/app/releases/ROOT/6.22.08/x86_64-centos7-gcc48-opt/lib/libCore.so.6.22
#14 0x00007fe0edcae6fa in TUnixSystem::DispatchOneEvent(bool) () from /cvmfs/sft.cern.ch/lcg/app/releases/ROOT/6.22.08/x86_64-centos7-gcc48-opt/lib/libCore.so.6.22
#15 0x00007fe0edbdb3b6 in TSystem::InnerLoop() () from /cvmfs/sft.cern.ch/lcg/app/releases/ROOT/6.22.08/x86_64-centos7-gcc48-opt/lib/libCore.so.6.22
#16 0x00007fe0edbdc2a0 in TSystem::Run() () from /cvmfs/sft.cern.ch/lcg/app/releases/ROOT/6.22.08/x86_64-centos7-gcc48-opt/lib/libCore.so.6.22
#17 0x00007fe0edb7dc5f in TApplication::Run(bool) () from /cvmfs/sft.cern.ch/lcg/app/releases/ROOT/6.22.08/x86_64-centos7-gcc48-opt/lib/libCore.so.6.22
#18 0x000000000040147e in main ()
===========================================================
The lines below might hint at the cause of the crash.
You may get help by asking at the ROOT forum https://root.cern.ch/forum
Only if you are really convinced it is a bug in ROOT then please submit a
report at https://root.cern.ch/bugs Please post the ENTIRE stack trace
from above as an attachment in addition to anything else
that might help us fixing this issue.
===========================================================
#5 0x00007fe0d8a4c0b2 in TTreeReader::~TTreeReader() () from /cvmfs/sft.cern.ch/lcg/app/releases/ROOT/6.22.08/x86_64-centos7-gcc48-opt/lib/libTreePlayer.so.6.22
#6 0x00007fe0d4c497b3 in Analyser::~Analyser() () from /afs/cern.ch/work/b/bouquet/private/VHbb_branch/VHbb_test_tprocess/build/libAnalyser.so
#7 0x00007fe0d7807cf3 in TProofPlayer::~TProofPlayer() () from /cvmfs/sft.cern.ch/lcg/app/releases/ROOT/6.22.08/x86_64-centos7-gcc48-opt/lib/libProofPlayer.so.6.22
#8 0x00007fe0d7818011 in TProofPlayerSlave::~TProofPlayerSlave() () from /cvmfs/sft.cern.ch/lcg/app/releases/ROOT/6.22.08/x86_64-centos7-gcc48-opt/lib/libProofPlayer.so.6.22
#9 0x00007fe0dcae15b2 in TProofServ::DeletePlayer() () from /cvmfs/sft.cern.ch/lcg/app/releases/ROOT/6.22.08/x86_64-centos7-gcc48-opt/lib/libProof.so.6.22.08
#10 0x00007fe0dcafa068 in TProofServ::HandleSocketInput(TMessage*, bool) () from /cvmfs/sft.cern.ch/lcg/app/releases/ROOT/6.22.08/x86_64-centos7-gcc48-opt/lib/libProof.so.6.22.08
#11 0x00007fe0dcae8c8f in TProofServ::HandleSocketInput() () from /cvmfs/sft.cern.ch/lcg/app/releases/ROOT/6.22.08/x86_64-centos7-gcc48-opt/lib/libProof.so.6.22.08
#12 0x00007fe0dcafdae1 in TProofServLiteInputHandler::Notify() () from /cvmfs/sft.cern.ch/lcg/app/releases/ROOT/6.22.08/x86_64-centos7-gcc48-opt/lib/libProof.so.6.22.08
#13 0x00007fe0edcad5b5 in TUnixSystem::CheckDescriptors() () from /cvmfs/sft.cern.ch/lcg/app/releases/ROOT/6.22.08/x86_64-centos7-gcc48-opt/lib/libCore.so.6.22
#14 0x00007fe0edcae6fa in TUnixSystem::DispatchOneEvent(bool) () from /cvmfs/sft.cern.ch/lcg/app/releases/ROOT/6.22.08/x86_64-centos7-gcc48-opt/lib/libCore.so.6.22
#15 0x00007fe0edbdb3b6 in TSystem::InnerLoop() () from /cvmfs/sft.cern.ch/lcg/app/releases/ROOT/6.22.08/x86_64-centos7-gcc48-opt/lib/libCore.so.6.22
#16 0x00007fe0edbdc2a0 in TSystem::Run() () from /cvmfs/sft.cern.ch/lcg/app/releases/ROOT/6.22.08/x86_64-centos7-gcc48-opt/lib/libCore.so.6.22
#17 0x00007fe0edb7dc5f in TApplication::Run(bool) () from /cvmfs/sft.cern.ch/lcg/app/releases/ROOT/6.22.08/x86_64-centos7-gcc48-opt/lib/libCore.so.6.22
#18 0x000000000040147e in main ()
===========================================================
23:30:25 30333 Wrk-0.0 | Error in <TProofServLite::HandleException>: caugth exception triggered by signal '1' while processing dset:'TDSet:Nominal', file:'/eos/home-b/bouquet/VHbbcc_results/VHbb_1L_overlap_v1//Reader_1L_33-05_e_InclusiveMerge_D_D/fetch/data-MVATree//qqWlvHbbJ_PwPy8MINLO-6.root' - check logs for possible stacktrace - last event: 195859