Hi,
In fact, I tried starting from scratch and reinstalled PoD 3.12. I again tried to start the pod server but I no longer see any workers running (“pod-info -n” returns 0). I do “pod-server start” and “pod-submit -r lsf -q 1nh -n 20”. Then “pod-info -ls” gives me:
XPROOFD [17995] port: 21002
PoD agent [18141] port: 22002
All my 20 jobs are running but I am still getting 0 with “pod-info -n”. I have no idea what is the difference since I reinstalled the PoD package.
Here is my PoD.cfg file:
[code]# PoD user defaults
version: 0.8
Please use PoD User’s Manual to find out more details on
keys and values of this configuration file.
PoD User’s Manual can be found in $POD_LOCATION/doc folder or
[server]
#work_dir=$HOME/.PoD
#sandbox_dir=$HOME/.PoD
#logfile_dir=$HOME/.PoD/log
work_dir=/tmp/$USER/.PoD
sandbox_dir=/tmp/$USER/.PoD
logfile_dir=/tmp/$USER/.PoD/log
logfile_overwrite=yes
log_level=1
agent_shutdown_if_idle_for_sec=86400
agent_local_client_port_min=20000
agent_local_client_port_max=25000
xproof_ports_range_min=21001
xproof_ports_range_max=22000
agent_ports_range_min=22001
agent_ports_range_max=23000
agent_threads=5
agent_node_readbuffer=5000
packet_forwarding=auto
proof_cfg_entry_pattern=worker %user%@%host% port=%port% pref=100
[worker]
work_dir=$POD_LOCATION/
logfile_dir=$POD_LOCATION/
logfile_overwrite=yes
log_level=1
set_my_rootsys=yes
my_rootsys=$ROOTSYS
agent_shutdown_if_idle_for_sec=86400
xproof_ports_range_min=21001
xproof_ports_range_max=22000
agent_node_readbuffer=5000
[lsf_plugin]
email_job_output=no
upload_job_log=no
[pbs_plugin]
upload_job_log=no
[ge_plugin]
upload_job_log=no
options_file=$POD_LOCATION/etc/Job.oge.option
[condor_plugin]
upload_job_log=no
options_file=$POD_LOCATION/etc/Job.condor.option
[/code]
Here is my user_worker_env.sh file:
! /usr/bin/env bash
echo "Setting user environment for workers ..."
source /afs/cern.ch/sw/lcg/contrib/gcc/4.3.5/x86_64-slc5-gcc34-opt/setup.sh
export
LD_LIBRARY_PATH=/afs/cern.ch/sw/lcg/external/qt/4.4.2/x86_64-slc5-gcc43-opt/lib:\
/afs/cern.ch/sw/lcg/external/Boost/1.47.0_python2.6/x86_64-slc5-gcc43-opt//lib:\
/afs/cern.ch/sw/lcg/external/xrootd/3.2.4/x86_64-slc5-gcc43-opt/lib64:\
/afs/cern.ch/sw/lcg/app/releases/ROOT/5.34.05/x86_64-slc5-gcc43-opt/root/lib:\
/afs/cern.ch/sw/lcg/external/Python/2.6.2/x86_64-slc5-gcc43-opt/lib:\
/afs/cern.ch/sw/lcg/contrib/gcc/4.3.5/x86_64-slc5-gcc43-opt/lib64:\
/afs/cern.ch/sw/lcg/contrib/mpfr/2.3.1/x86_64-slc5-gcc43-opt/lib:\
/afs/cern.ch/sw/lcg/contrib/gmp/4.2.2/x86_64-slc5-gcc43-opt/lib
echo "LD_LIBRARY_PATH=$LD_LIBRARY_PATH"
Here is my user_xpd.cf file:
xpd.rootd allow
xpd.intwait 600
All these three files are under my $HOME/.PoD directory. On each start of the session, I do “source $HOME/PoD/3.12/PoD_env.sh”.
Finally, here are my log files:
tmp/salekd/.PoD/log/pod-agent.server.log
2013-03-18 16:15:01.770 INF 0 [LOG singleton:thread-16820] LOG singleton has been initialized.
2013-03-18 16:15:01.770 INF 0 [PROOFAgent:thread-16820] pod-agent v.3.12
2013-03-18 16:15:01.771 INF 0 [CORE:thread-16820] Bringing >>> AgentServer <<< to life...
2013-03-18 16:15:01.771 INF 0 [CORE:thread-16820] Bringing >>> ThreadPool <<< to life...
2013-03-18 16:15:01.771 INF 0 [ThreadPool:thread-16822] starting a thread worker.
2013-03-18 16:15:01.771 INF 0 [ThreadPool:thread-16823] starting a thread worker.
2013-03-18 16:15:01.771 INF 0 [ThreadPool:thread-16824] starting a thread worker.
2013-03-18 16:15:01.771 INF 0 [ThreadPool:thread-16825] starting a thread worker.
2013-03-18 16:15:01.771 INF 0 [ThreadPool:thread-16826] starting a thread worker.
2013-03-18 16:15:01.771 INF 0 [AgentServer:thread-16820] Detected xpd [16285] on port 21001
2013-03-18 16:15:01.771 INF 0 [AgentServer:thread-16820] starting a monitor
2013-03-18 16:15:01.773 INF 0 [AgentServer:thread-16820] Entering into the main 'select' loop...
2013-03-18 16:15:28.210 INF 0 [AgentServer:thread-16820] Accepting the connetion from PoD UI: lxplus415.cern.ch:50266
2013-03-18 16:15:28.210 INF 0 [AgentServer:thread-16820] Client requests a list of available workers.
2013-03-18 16:15:28.210 INF 0 [AgentServer:thread-16820] Client lxplus415.cern.ch:50266 has just dropped the connection
2013-03-18 16:15:28.220 INF 0 [AgentServer:thread-16820] Accepting the connetion from PoD UI: lxplus415.cern.ch:50267
2013-03-18 16:15:28.220 INF 0 [AgentServer:thread-16820] Client requests a list of available workers.
2013-03-18 16:15:28.220 INF 0 [AgentServer:thread-16820] Client lxplus415.cern.ch:50267 has just dropped the connection
2013-03-18 16:15:38.437 INF 0 [AgentServer:thread-16820] Accepting the connetion from PoD UI: lxplus415.cern.ch:50291
2013-03-18 16:15:38.437 INF 0 [AgentServer:thread-16820] Client requests a list of available workers.
2013-03-18 16:15:38.437 INF 0 [AgentServer:thread-16820] Client lxplus415.cern.ch:50291 has just dropped the connection
2013-03-18 16:15:38.478 INF 0 [AgentServer:thread-16820] Accepting the connetion from PoD UI: lxplus415.cern.ch:50292
2013-03-18 16:15:38.478 INF 0 [AgentServer:thread-16820] Client requests a list of available workers.
2013-03-18 16:15:38.478 INF 0 [AgentServer:thread-16820] Client lxplus415.cern.ch:50292 has just dropped the connection
/tmp/salekd/.PoD/log/PoDServer/xpd.log
130318 16:14:56 16285 Scalla is starting. . .
Copr. 2010 Stanford University, xrd version v3.2.4
++++++ xproofd PoDServer@lxplus415.cern.ch initialization started.
Config using configuration file /tmp/salekd/.PoD/etc/xpd.cf
=====> xrd.adminpath /tmp/PoD_ZRLhP16134
Config maximum number of connections restricted to 1024
130318 16:14:56 16285 xpd.port 21001
130318 16:14:56 16285 xpd.port 21001
130318 16:14:56 16285 xpd.tmp /tmp/PoD_ZRLhP16134
130318 16:14:56 16285 xpd.role master
130318 16:14:56 16285 xpd-I: Manager::DoDirectiveRootd: val: allow
130318 16:14:56 16285 xpd.rootd allow
130318 16:14:56 16285 xpd-I: Manager::Config: configuring
130318 16:14:56 16285 xpd-I: Manager::Config: listening on port 21001
130318 16:14:56 16285 xpd-I: Manager::Config: using temp dir: /tmp/PoD_ZRLhP16134
130318 16:14:56 16285 xpd-I: Manager::Config: role set to: master
130318 16:14:56 16285 xpd-I: Manager::Config: admin path set to: /tmp/PoD_ZRLhP16134/PoDServer//.xproofd.21001
130318 16:14:56 16285 xpd-I: Manager::Config: unix sockets under: /tmp/PoD_ZRLhP16134/PoDServer//.xproofd.21001/socks
130318 16:14:56 16285 xpd-I: Manager::Config: PROOF pool: root://lxplus415.cern.ch
130318 16:14:56 16285 xpd-I: Manager::Config: PROOF pool namespace: /proofpool
130318 16:14:56 16285 xpd-I: Manager::LoadScheduler: scheduler loaded: type: default
130318 16:14:56 16285 xpd-I: Manager::Config: user config files are disabled
130318 16:14:56 16285 xpd-I: Manager::Config: no dataset sources defined
130318 16:14:56 16285 xpd-I: Manager::Config: list of superusers: salekd
130318 16:14:56 16285 xpd-I: Manager::Config: bare lib path for proofserv (full LD_LIBRARY_PATH): /afs/cern.ch/user/s/salekd/PoD/3.12/lib:/afs/cern.ch/user/s/salekd/work/SFrame/RootCore/lib:/afs/cern.ch/user/s/salekd/work/SFrame/SFrame/lib:/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/x86_64/xrootd/3.2.4-x86_64-slc5/v3.2.4/lib:/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/x86_64/root/5.34.05-x86_64-slc5-gcc4.3/lib:/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/x86_64/python/2.6.5-x86_64-slc5-gcc43/sw/lcg/external/Python/2.6.5/x86_64-slc5-gcc43-opt/lib:/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/x86_64/Gcc/gcc435_x86_64_slc5/gcc-alt-435/x86_64-slc5-gcc43-opt/lib64:/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/x86_64/Gcc/gcc435_x86_64_slc5/gcc-alt-435/x86_64-slc5-gcc43-opt/lib:
130318 16:14:56 16285 xpd-I: Group::Print: +++ Group: default
130318 16:14:56 16285 xpd-I: Group::Print: +++ Priority: -1, fraction: -1
130318 16:14:56 16285 xpd-I: Group::Print: +++ End of Group: default
130318 16:14:56 16285 xpd-I: Admin::Config: configuring
130318 16:14:56 16285 xpd-I: Admin::Config: allowed/supported copy commands: root:xrdcp,https:wget,file:cp,http:wget,xrd:xrdcp
130318 16:14:56 16285 xpd.resource static /tmp/salekd/.PoD/proof.conf
130318 16:14:56 16285 xpd-I: NetMgr::Config: configuring
130318 16:14:56 16285 xpd-I: NetMgr::Config: PROOF config file: /tmp/salekd/.PoD/proof.conf
130318 16:14:56 16285 xpd-I: NetMgr::Config: PROOF config file will be reloaded upon change
130318 16:14:56 16285 xpd-I: NetMgr::Config: 0 worker nodes defined at start-up
130318 16:14:56 16285 xpd-I: PriorityMgr::Config: configuring
130318 16:14:56 16285 xpd-I: PriorityMgr::Config: no priority changes requested
130318 16:14:56 16285 xpd-I: PriorityMgr::Config: poller thread started
130318 16:14:56 16285 xpd-I: ROOTMgr::SetLogDir: rootsys log validation path: /tmp/PoD_ZRLhP16134/PoDServer//.xproofd.21001/rootsysvalidation
130318 16:14:56 16285 xpd-I: ROOTMgr::Config: configuring
130318 16:15:01 16285 xpd-I: ROOTMgr::Config: ROOT dist: '5.34/05 5.34/05 /cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/x86_64/root/5.34.05-x86_64-slc5-gcc4.3 35' validated
130318 16:15:01 16285 xpd-I: ROOTMgr::Config: ROOT version details: svn: 48568, code: 336389, {mnp} = {5,34,5}
130318 16:15:01 16285 xpd-I: ClientMgr::Config: configuring
130318 16:15:01 16285 xpd-I: ClientMgr::Config: clients admin path set to: /tmp/PoD_ZRLhP16134/PoDServer//.xproofd.21001/clients
130318 16:15:01 16285 xpd-I: ClientMgr::Config: XRD seclib not specified; strong authentication disabled
130318 16:15:01 16285 xpd-I: ClientMgr::Config: cron thread started
130318 16:15:01 16285 xpd.intwait 600
130318 16:15:01 16285 xpd-I: ProofServMgr::Config: configuring
130318 16:15:01 16285 xpd-I: ProofServMgr::Config: setting internal timeout to 600 secs
130318 16:15:01 16285 xpd-I: ProofServMgr::Config: client sessions kept idle for 0 secs after disconnection
130318 16:15:01 16285 xpd-I: ProofServMgr::Config: active sessions admin path set to: /tmp/PoD_ZRLhP16134/PoDServer//.xproofd.21001/activesessions
130318 16:15:01 16285 xpd-I: ProofServMgr::Config: terminated sessions admin path set to /tmp/PoD_ZRLhP16134/PoDServer//.xproofd.21001/terminatedsessions
130318 16:15:01 16285 xpd-I: ProofServMgr::Config: RC settings: 0
130318 16:15:01 16285 xpd-I: ProofServMgr::Config: ENV settings: 0
130318 16:15:01 16285 xpd-I: ProofServMgr::Config: using fork() to start proofserv sessions
130318 16:15:01 16285 xpd-I: ProofServMgr::Config: cron thread started
130318 16:15:01 16285 xpd.resource static /tmp/salekd/.PoD/proof.conf
130318 16:15:01 16285 xpd-I: Sched::Config: cron thread started
130318 16:15:01 16285 xpd-I: Manager::Config: unix socket path for rootd call backs: /tmp/PoD_ZRLhP16134/PoDServer//.xproofd.21001/socks/xpd.21001.16285.rootd
130318 16:15:01 16285 xpd-I: Manager::Config: serving files with: '/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/x86_64/root/5.34.05-x86_64-slc5-gcc4.3/bin/rootd' (protocol: 'rootd://') to ALL hosts
130318 16:15:01 16285 xpd-I: Manager::Config: manager cron thread started
130318 16:15:01 16285 xpd-I: Protocol::Configure: global manager created
130318 16:15:01 16285 xpd-I: Protocol::Configure: xproofd protocol version 0.7 build v3.2.4 successfully loaded
130318 16:15:01 16775 xpd-I: ProofServCron: next full sessions check in 30 secs
------ xproofd PoDServer@lxplus415.cern.ch:21001 initialization completed.
130318 16:15:31 16776 xpd-I: SchedCron: running regular checks
130318 16:15:31 16775 xpd-I: ProofServMgr::CleanupLostProofServ: pidpath: /tmp/.xproofd.1093/xrootd.pid
130318 16:15:31 16775 xpd-I: ProofServMgr::CleanupLostProofServ: pidpath: /tmp/.xproofd.1093/xrootd.pid
130318 16:15:31 16775 xpd-I: ProofServMgr::CleanupLostProofServ: pidpath: /tmp/.xproofd.1093/xrootd.pid
130318 16:15:31 16775 xpd-I: ProofServMgr::CleanupLostProofServ: pidpath: /tmp/.xproofd.1093/xrootd.pid
130318 16:15:31 16775 xpd-I: ProofServMgr::CleanupLostProofServ: pidpath: /tmp/.xproofd.1093/xrootd.pid
130318 16:15:31 16775 xpd-I: ProofServMgr::CleanupLostProofServ: pidpath: /tmp/.xproofd.1093/xrootd.pid
130318 16:15:31 16775 xpd-I: ProofServMgr::CleanupLostProofServ: pidpath: /tmp/.xproofd.1093/xrootd.pid
130318 16:15:31 16775 xpd-I: ProofServMgr::CleanupLostProofServ: pidpath: /tmp/.xproofd.1093/xrootd.pid
130318 16:15:31 16775 xpd-I: ProofServMgr::CleanupLostProofServ: pidpath: /tmp/.xproofd.1093/xrootd.pid
130318 16:15:31 16775 xpd-I: ProofServMgr::CleanupLostProofServ: pidpath: /tmp/.xproofd.1093/xrootd.pid
130318 16:15:31 16775 xpd-I: ProofServMgr::CleanupLostProofServ: pidpath: /tmp/.xproofd.1093/xrootd.pid
130318 16:15:31 16775 xpd-I: ProofServMgr::CleanupLostProofServ: pidpath: /tmp/.xproofd.1093/xrootd.pid
130318 16:15:31 16775 xpd-I: ProofServMgr::CleanupLostProofServ: pidpath: /tmp/.xproofd.1093/xrootd.pid
130318 16:15:31 16775 xpd-I: ProofServMgr::CleanupLostProofServ: pidpath: /tmp/.xproofd.1093/xrootd.pid
130318 16:15:31 16775 xpd-I: ProofServMgr::CleanupLostProofServ: pidpath: /tmp/.xproofd.1093/xrootd.pid
130318 16:15:31 16775 xpd-I: ProofServCron: 0 sessions are currently active
130318 16:15:31 16775 xpd-I: ProofServCron: next sessions check in 30 secs
130318 16:16:01 16776 xpd-I: SchedCron: running regular checks
130318 16:16:01 16775 xpd-I: ProofServCron: 0 sessions are currently active
130318 16:16:01 16775 xpd-I: ProofServCron: next sessions check in 30 secs
130318 16:16:31 16776 xpd-I: SchedCron: running regular checks
130318 16:16:31 16775 xpd-I: ProofServCron: 0 sessions are currently active
130318 16:16:31 16775 xpd-I: ProofServCron: next sessions check in 30 secs
130318 16:17:01 16776 xpd-I: SchedCron: running regular checks
130318 16:17:01 16775 xpd-I: ProofServCron: 0 sessions are currently active
130318 16:17:01 16775 xpd-I: ProofServCron: next sessions check in 30 secs
130318 16:17:31 16776 xpd-I: SchedCron: running regular checks
130318 16:17:31 16775 xpd-I: ProofServCron: 0 sessions are currently active
130318 16:17:31 16775 xpd-I: ProofServCron: next sessions check in 30 secs
130318 16:18:01 16776 xpd-I: SchedCron: running regular checks
130318 16:18:01 16775 xpd-I: ProofServCron: 0 sessions are currently active
130318 16:18:01 16775 xpd-I: ProofServCron: next sessions check in 30 secs
130318 16:18:31 16776 xpd-I: SchedCron: running regular checks
130318 16:18:31 16775 xpd-I: ProofServCron: 0 sessions are currently active
130318 16:18:31 16775 xpd-I: ProofServCron: next sessions check in 30 secs