I’m trying to use TProofBench::Run on 5.30/04, but at a certain point it crashes with the usual message:
Info in <TProofBenchRunCPU::Run>: Running CPU-bound tests with 49 active worker(s); trial 4/4
Worker 'proof-01.mi.infn.it-0.96' has been removed from the active list
The corresponding log is this:
17:11:38 9341 Mst-0 | Info in <TXProofServ::SetQueryRunning>: starting query: 196
17:11:38 9341 Mst-0 | Info in <TProofQueryResult::SetRunning>: nwrks: 49
17:11:38 9341 Mst-0 | Info in <TXProofServ::HandleInput>: kXPD_clusterinfo: tot: 1, act: 1, eff: 1.000000
17:11:38 9341 Mst-0 | Info in <TXProofServ::HandleInput>: kXPD_clusterinfo: tot: 1, act: 1, eff: 1.000000
17:11:38 9341 Mst-0 | Info in <TXProofServ::HandleInput>: kXPD_clusterinfo: tot: 1, act: 1, eff: 1.000000
17:11:38 9341 Mst-0 | Info in <TXProofServ::HandleInput>: kXPD_clusterinfo: tot: 1, act: 1, eff: 1.000000
17:11:38 9341 Mst-0 | Info in <TXProofServ::HandleInput>: kXPD_clusterinfo: tot: 1, act: 1, eff: 1.000000
17:11:38 9341 Mst-0 | Info in <TXProofServ::HandleInput>: kXPD_clusterinfo: tot: 1, act: 1, eff: 1.000000
17:11:38 9341 Mst-0 | Info in <TXProofServ::HandleInput>: kXPD_clusterinfo: tot: 1, act: 1, eff: 1.000000
17:11:40 9341 Mst-0 | Info in <TProof::HandleInputMessage>: finalization on Mst-0 started ...
17:11:40 9341 Mst-0 | Info in <TXProofServ::HandleInput>: kXPD_clusterinfo: tot: 1, act: 1, eff: 1.000000
17:11:40 9341 Mst-0 | Info in <TXProofServ::HandleInput>: kXPD_clusterinfo: tot: 1, act: 1, eff: 1.000000
17:11:40 9341 Mst-0 | Info in <TXProofServ::HandleInput>: kXPD_clusterinfo: tot: 1, act: 1, eff: 1.000000
17:11:41 9341 Mst-0 | Info in <TXProofServ::HandleInput>: kXPD_clusterinfo: tot: 1, act: 1, eff: 1.000000
17:11:41 9341 Mst-0 | Info in <TXProofServ::HandleInput>: kXPD_clusterinfo: tot: 1, act: 1, eff: 1.000000
17:11:41 9341 Mst-0 | Info in <TXProofServ::HandleInput>: kXPD_clusterinfo: tot: 1, act: 1, eff: 1.000000
17:11:42 9341 Mst-0 | Error in <TXSocket::ProcessUnsolicitedMsg>: 0x2aaab0001db0: async semaphore taken by Close()! Should not be here!
17:11:42 9341 Mst-0 | Error in <TXSocket::ProcessUnsolicitedMsg>: 0x2aaab00091e0: async semaphore taken by Close()! Should not be here!
| session: turra.default.11715.status terminated by peer
17:12:17 9341 Mst-0 | Info in <TXSlave::HandleError>: 0x2aaab0033b30:proof-01.mi.infn.it:0.96 got called ... fProof: 0x1780dd90, fSocket: 0x2aaab0033cb0 (valid: 1)
17:12:17 9341 Mst-0 | Info in <TXSlave::HandleError>: 0x2aaab0033b30: proof: 0x1780dd90
17:12:17 9341 Mst-0 | Info in <TProof::MarkBad>:
+++ Message from top master at proof-06.mi.infn.it:1093 : marking proof-01.mi.infn.it:1093 (0.96) as bad
+++ Reason: received kPROOF_FATAL
TXSlave::HandleError: 0x2aaab0033b30: DONE ...
120321 17:12:17 9341 Proofx-E: Conn::LowWrite: sending header to server [proof-01.mi.infn.it:1093] (rc=-3)
120321 17:12:17 9341 Proofx-E: Conn::SendRecv: problems sending request to server [proof-01.mi.infn.it:1093]
// --------- End of element log -------------------
Retrieving logs: 1 ok, 0 not ok (100 % processed)
// --------- Start of element log -----------------
// Ordinal: 0.96 (role: worker)
// Path: turra@proof-01.mi.infn.it:1093//proof/workingdirs/turra/session-proof-06-1332345582-9341/worker-0.96-proof-01-1332345585-12593.log
// # of retrieved lines: 23
// ------------------------------------------------
120321 16:59:45 10786 xpd-I: ProofServMgr::CreateFork: log file: /proof/workingdirs/turra/session-proof-06-1332345582-9341/worker-0.96-proof-01-1332345585-12593.log
120321 16:59:45 10786 xpd-I: ProofServMgr::CreateFork: child process 12593
120321 16:59:45 10786 xpd-I: ProofServMgr::CreateFork: admin path: /proof/proofadmin/.xproofd.1093/activesessions/turra.default.12593
120321 16:59:45 10786 xpd-I: ProofServMgr::CreateFork: UNIX sock path: /proof/proofadmin/.xproofd.1093/socks/xpd.1093.12593
120321 16:59:45 10786 xpd-I: ProofServMgr::CreateFork: srvtype = 0
120321 16:59:45 10786 xpd-I: ProofServMgr::SetUserOwnerships: enter
120321 16:59:45 10786 xpd-I: ProofServMgr::SetUserOwnerships: done
120321 16:59:45 10786 xpd-I: ProofServMgr::SetUserEnvironment: enter
120321 16:59:45 10786 xpd-I: ProofServMgr::SetUserEnvironment: done
120321 16:59:45 10786 xpd-I: ProofServMgr::SetProofServEnv: psid: 12, log: 0
120321 16:59:45 10786 xpd-I: ProofServMgr::SetProofServEnv: ROOT dir: /gpfs/storage_4/users/home/proof/root
120321 16:59:45 10786 xpd-I: ProofServMgr::CreateProofServRootRc: session rootrc file: /proof/workingdirs/turra/session-proof-06-1332345582-9341/worker-0.96-proof-01-1332345585-12593.rootrc
120321 16:59:45 10786 xpd-I: ProofServMgr::CreateProofServEnvFile: environment file: /proof/workingdirs/turra/session-proof-06-1332345582-9341/worker-0.96-proof-01-1332345585-12593.env
120321 16:59:45 10786 xpd-I: ProofServMgr::SetProofServEnv: creating symlink
120321 16:59:45 10786 xpd-I: ProofServMgr::SetProofServEnv: done
120321 16:59:45 10786 xpd-I: ProofServMgr::CreateFork: 12593: proofserv env set up
120321 16:59:45 10786 xpd-I: ProofServMgr::CreateFork: 12593: log file path communicated
120321 16:59:45 10786 xpd-I: ProofServMgr::CreateFork: 12593: user: turra, uid: 11547, euid:11547, psrv: /gpfs/storage_4/users/home/proof/root/bin/proofserv
Received SIGTERM: terminating
17:14:53 12593 Wrk-0.96 | Info in <TXProofServ::Terminate>: starting session termination operations ...
17:14:53 12593 Wrk-0.96 | Info in <TXProofServ::Terminate>: process memory footprint: 138008/-1 kB virtual, 28016/-1 kB resident
17:14:55 12593 Wrk-0.96 | Info in <TXProofServ::Terminate>: data directory '/proof/workingdirs/turra/data/0.96/proof-01-1332345585-12593' has been removed
Terminate: termination operations ended: quitting!
// --------- End of element log -------------------