Proof <TXSocket::PickUpReady>: error waiting at semaphore

will_cern · August 19, 2013, 3:48pm

Hello,

I have been attempting to execute a TSelector analysis that runs fine on proof-lite on a PoD proof cluster running on a condor backend. The execution begins successfully, and about 30-40% of my entries are processed, but then each of the workers starts throwing the following:

16:30:57 17606 Wrk-0.22 | Error in TXSocket::PickUpReady: error waiting at semaphore
16:30:57 17606 Wrk-0.22 | Error in TXProofServ::GetNextPacket: Recv() failed, returned -1
16:37:50 17606 Wrk-0.22 | Error in TXProofServ::HandleSocketInput: unknown command 1011

What could be causing this to happen? I am a little suspicious that it occurs approximately 30 minutes into the execution, which is the default time after which pod is supposed to shut down idle workers (although pod-info -n appears to show plenty up).

Thanks,

Will

ganis · August 23, 2013, 11:15pm

Hi,

This typically happen when a message collection goes to time out; it may be related to connection going down.
Do you find any errors in the logs?
Can you specify the ROOT version ?
Can you try by raising the Pod idle timeout to see if anything change?

G. Ganis