OK with PROOF-lite, no good with PROOF(-full)

Hello,

I have recently started to use PROOF-lite on my multiprocessor laptop and
it’s great :smiley: !
Now I would like to take the next step to building a PROOF-enabled cluster in my lab,
which is to try to run an analysis with PROOF on another machine.
The example I am trying to make work (which already works with PROOF-lite) is
simple: I have a list of ROOT files, Z20_A40_M*.root, with M=1,2,…,40,
which I try to analyse as a TChain.
With PROOF-lite, everything is fine:

root [0] .x runAnalysis.C(“Z20_A40”)
+++ Starting PROOF-Lite with 4 workers +++
Opening connections to workers: OK (4 workers)
Setting up worker servers: OK (4 workers)
PROOF set to parallel mode (4 workers)
Wrk-0.0: building PartitionCalculator …
Wrk-0.0: make: `libPartitionCalculator.so’ is up to date.
Wrk-0.0: rc=0

Info in TProofLite::SetQueryRunning: starting query: 1
Info in TProofQueryResult::SetRunning: nwrks: 4
Info in TUnixSystem::ACLiC: creating shared library /home/john/work/partitions/sources/decomp/analysis/nuclear_partitions/./Analysis_C.so
Warning in TClassTable::Add: class Nucleus already in TClassTable
Looking up for exact location of files: OK (40 files)
Looking up for exact location of files: OK (40 files)
Info in TPacketizerAdaptive::TPacketizerAdaptive: Setting max number of workers per node to 4
Validating files: OK (40 files)
Info in TPacketizerAdaptive::InitStats: fraction of remote files 0.000000
Info in Analysis::Terminate: Total number of partitions = 694168.000000
Info in Analysis::Terminate: Total number of partitions used = 339617.000000
Info in Analysis::Terminate: Ratio = 48.924324 %
Lite-0: all output objects have been merged

The script runAnalysis.C is in attachment.

Now I try to use PROOF on another machine, ganp329. I started the ‘xproofd’ demon in command-line mode
on the remote machine without arguments, options, or any special configuration. The remote machine
has 4 CPU workers:

[frankland@ganp329 ~]$ xproofd
120313 16:11:55 001 Scalla is starting. . .
Copr. 2010 Stanford University, xrd version v3.1.0
++++++ xproofd anon@localhost.localdomain initialization started.
Config maximum number of connections restricted to 1024
120313 16:11:55 001 xpd-I: Manager::Config: configuring
120313 16:11:55 001 xpd-I: Manager::Config: listening on port 1093

120313 16:11:55 001 xpd-I: NetMgr::Config: configuring
120313 16:11:55 001 xpd-I: NetMgr::Config: PROOF config file: none
120313 16:11:55 001 xpd-I: NetMgr::Config: 4 worker nodes defined at start-up

120313 16:12:00 001 xpd-I: Manager::Config: manager cron thread started
120313 16:12:00 001 xpd-I: Protocol::Configure: global manager created
120313 16:12:00 001 xpd-I: Protocol::Configure: xproofd protocol version 0.6 build v3.1.0 successfully loaded
------ xproofd anon@localhost.localdomain:1093 initialization completed.
120313 16:12:00 4308 xpd-I: frankland.19126:29@ganp115.ganil.local: ClientMgr::MapClient: user frankland logged-in (privileged); type: ClientMaster
120313 16:12:00 4308 xpd-E: frankland.19126:29@ganp115.ganil.local: ProofServMgr::Attach: session ID not found: 0
120313 16:12:00 4308 xpd-E: frankland.19126:29@ganp115.ganil.local: ProofServMgr::Attach: session ID not found: 0
120313 16:12:00 4308 xpd-E: frankland.19126:29@ganp115.ganil.local: ProofServMgr::Attach: session ID not found: 0
120313 16:12:00 4308 xpd-E: frankland.19126:29@ganp115.ganil.local: ProofServMgr::Attach: session ID not found: 0

ganp115 is my laptop. I don’t know if the messages at the end of the demon start up are significant.

Now I try to run my analysis from my laptop in exactly the same
way, except in the script runAnalysis.C I replace TProof::Open("") with TProof::Open(“frankland@ganp329”)
(my login is not the same on my laptop as on the cluster):

root [0] .x runAnalysis.C(“Z20_A40”)
Starting master: opening connection …
Starting master: OK
Opening connections to workers: OK (4 workers)
Setting up worker servers: OK (4 workers)
PROOF set to parallel mode (4 workers)
Mst-0: building PartitionCalculator …
Mst-0: make: `libPartitionCalculator.so’ is up to date.
Mst-0: rc=0
Warning in TClassTable::Add: class Nucleus already in TClassTable
Looking up for exact location of files: OK (40 files)
Looking up for exact location of files: OK (40 files)
Cannot get entries for file: /home/john/work/partitions/sources/decomp/analysis/nuclear_partitions/Z20_A40_M12.root - skipping
Cannot get entries for file: /home/john/work/partitions/sources/decomp/analysis/nuclear_partitions/Z20_A40_M13.root - skipping
etc. etc.

Why can’t it read my files any more ?

Thanks for your help
runAnalysis.C (492 Bytes)

Hi,

Sorry for the late re-action.
If I understand correctly the files are physically in your $HOME on ganp115, so they are not visible by ganp329, where the PROOF workers will run.
You need to make the files somehow available on the worker machines. Either via a shared file system or via a file server.
To try out, you can start on your laptop an xrootd server daemon

$ xrootd /home/john/work/partitions

and then add ‘root://ganp115.ganil.local/’ in front of your paths when building the chain, i.e.

   chain->Add("root://ganp115.ganil.local//home/john/work/partitions/sources/decomp/analysis/nuclear_partitions/Z20_A40_M1.root")
   ...

Or you can upload the files somewhere on ganp329 …

G. Ganis