Proof on Demand + ROOT - cannot start TProof (XrdProofConn connection failure)

Hi,

I’ve made attempt to set up a proof cluster using two desktop computers. On both there is ROOT (5.34/36) and xrootd (4.7.1) installed. I’ve also installed PoD (3.16) on both machines and successfully (?) set it up according to instructions at http://pod.gsi.de/.

I am able to start pod server and verify the server status after submitting workers using pod-ssh:

rafal@120-D11:~$ pod-server start
Starting PoD server...
updating xproofd configuration file...
starting xproofd...
starting PoD agent...
preparing PoD worker package...
selecting pre-compiled bins to be added to worker package...
PoD worker package: /home/rafal/.PoD/wrk/PoDWorker.sh
------------------------
XPROOFD [19462] port: 21002
PoD agent [19492] port: 22002
PROOF connection string: rafal@120-D11:21002
------------------------
rafal@120-D11:~$ pod-ssh -c /usr/pod_ssh.cfg submit --debug
**      [Thu, 09 Nov 2017 23:59:14 +0100]       preparing PoD worker package...
**      [Thu, 09 Nov 2017 23:59:14 +0100]       selecting pre-compiled bins to be added to worker package...
**      [Thu, 09 Nov 2017 23:59:14 +0100]       PoD worker package: /home/rafal/.PoD/wrk/PoDWorker.sh
**      [Thu, 09 Nov 2017 23:59:14 +0100]       pod-ssh config contains an inline shell script. It will be injected it into wrk. package
**      [Thu, 09 Nov 2017 23:59:14 +0100]       preparing PoD worker package...
**      [Thu, 09 Nov 2017 23:59:14 +0100]       inline shell script is found and will be added to the package...
**      [Thu, 09 Nov 2017 23:59:14 +0100]       selecting pre-compiled bins to be added to worker package...
**      [Thu, 09 Nov 2017 23:59:14 +0100]       PoD worker package: /home/rafal/.PoD/wrk/PoDWorker.sh
**      [Thu, 09 Nov 2017 23:59:14 +0100]       There are 5 threads in the tread-pool.
**      [Thu, 09 Nov 2017 23:59:14 +0100]       Number of PoD workers: 2
**      [Thu, 09 Nov 2017 23:59:14 +0100]       Number of PROOF workers: 10
**      [Thu, 09 Nov 2017 23:59:14 +0100]       Workers list:
**      [Thu, 09 Nov 2017 23:59:14 +0100]       [kompRafal] with 6 workers at rafal@120-D11:/tmp/kompRafal
**      [Thu, 09 Nov 2017 23:59:14 +0100]       [kompStar] with 4 workers at rafal@star:/tmp/kompStar
kompRafal       [czw, 09 lis 2017 23:59:14 +0100]       pod-ssh-submit-worker is started for rafal@120-D11 (dir: /tmp/kompRafal, nworkers: 6, sshopt: -p 22)
kompStar        [czw, 09 lis 2017 23:59:14 +0100]       pod-ssh-submit-worker is started for rafal@star (dir: /tmp/kompStar, nworkers: 4, sshopt: -p 22)
**      [Thu, 09 Nov 2017 23:59:15 +0100]
*******************
Successfully processed tasks: 2
Failed tasks: 0                                                                                                                                                   
*******************                                                                                                                                               
rafal@120-D11:~$ pod-ssh status
PoD worker "kompRafal": RUN
PoD worker "kompStar": RUN

However, when I open ROOT and create a TProof object:

TProof *p = TProof::Open(“pod://”)

I get the following:

Starting master: opening connection ...
Starting master: OK                                                 
Opening connections to workers: OK (2 workers)                 
Note: File "iostream" already loaded
171109 23:59:43 20079 Proofx-E: Conn::Connect: failed to connect to proof://rafal:default@localhost:20000//
171109 23:59:43 20079 Proofx-E: XrdProofConn: XrdProofConn: severe error occurred while opening a connection to server [localhost:20000]
23:59:43 20079 Mst-0 | Warning in <TProof::AddWorkers>: worker '0.0' is invalid
171109 23:59:51 20079 Proofx-E: Conn::Connect: failed to connect to proof://rafal:default@localhost:20001//
171109 23:59:51 20079 Proofx-E: XrdProofConn: XrdProofConn: severe error occurred while opening a connection to server [localhost:20001]
23:59:51 20079 Mst-0 | Warning in <TProof::AddWorkers>: worker '0.1' is invalid
PROOF set to sequential mode
(class TProof*)0x2714960
root [1]  *** No workers left: cannot continue! Terminating ... *** 
 
| session: rafal.default.20079.status terminated by peer
Info in <TXSlave::HandleError>: 0x27f5f40:120-D11:0 got called ... fProof: 0x2714960, fSocket: 0x27f6170 (valid: 1)
Info in <TXSlave::HandleError>: 0x27f5f40: proof: 0x2714960
TXSlave::HandleError: 0x27f5f40: DONE ... 
Info in <TProof::MarkBad>: 
 +++ Message from local session : marking 120-D11:21002 (0) as bad
 +++ Reason: received kPROOF_FATAL

 +++ Message from local session : marking 120-D11:21002 (0) as bad
 +++ Reason: received kPROOF_FATAL

 +++ Most likely your code crashed
 +++ Please check the session logs for error messages either using
 +++ the 'Show logs' button or executing
 +++
 +++ root [] TProof::Mgr("rafal@120-D11:21002")->GetSessionLogs()->Display("*")


Info in <TXSocket::Reconnect>: 0x270e310: reconnection attempts explicitly disabled!

I will be extremely grateful for helping me solving this problem. I will provide appropriate logs or file contents if needed.

Best regards,
Rafal

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.