karim
July 8, 2008, 11:33am
1
Hi
Using gliteProof/Proof, I have a problem with UploadPackage
{
TDSet t = new TDSet( “TTree”,“vtuple” );
t->Add(“root://marsedpm.in2p3.fr//dpm/mrs.grid.cnrs.fr/home/atlas/cppm/vacavant/vt-mytt-pre14210_rel5Jun27-csc010200-NT.r
oot”);
TProof p = TProof::Open(“localhost:1097”);
p->SetParameter(“PROOF_MaxSlavesPerNode”,(Long_t) 20);
p->ClearPackages();
p->UploadPackage(“CManager2.par”, TProof::kRemoveOld);
p->EnablePackage(“CManager2”);
p->TProof::ShowPackages (kTRUE);
t->Process(“VTSelector.C+”);
}
the CManager2 package is put in
/tmp/proof/atl189/packages/
on the worker nodes
but on some of them it is not there and TProof::ShowPackages says
*** Package cache client:/atlas/vacavant/proof/packages ***
total 4
drwxr-xr-x 3 vacavant atlas 4096 Jul 8 13:27 CManager2
lrwxrwxrwx 1 vacavant atlas 37 Jul 8 13:27 CManager2.par -> /data/vacavant/laurent2/CManager2.par
*** Package cache marson.in2p3.fr :/data/proof/vacavant/packages ***
total 20
drwxr-xr-x 3 vacavant atlas 4096 Jul 8 13:27 CManager2
-rw-r–r-- 1 vacavant atlas 5482 Jul 8 13:27 CManager2.par
*** Package cache marwn44.in2p3.fr :/tmp/proof/atl189/packages ***
total 20
drwxr-xr-x 3 atl189 atlas 4096 Jul 8 13:27 CManager2
-rw-r–r-- 1 atl189 atlas 5482 Jul 8 13:27 CManager2.par
it is only found on one worker node, I do not understand why.
Thanks for any hint
Karim
ganis
July 9, 2008, 12:03pm
2
Dear Karim,
I am not sure to understand the problem. How many workers do you expect to have?
From the output of ShowPackages(kTRUE) that you posted you seem to have only one worker on marwn44.in2p3.fr and the package is correctly uploaded there.
Can you give more details about your configuration?
Also, can you post the output of p->Print(“a”) ?
Gerri Ganis
karim
July 9, 2008, 3:31pm
3
Hi Gerry
I am using gLiteProof with root 5.18.
I was using 8 workers
below the ouput of
p->TProof::ShowPackages (kTRUE);
p->TProof::ShowEnabledPackages (kTRUE);
p->Print (“a”);
…
PROOF set to parallel mode (8 workers)
(int)0
*** Package cache client:/atlas/vacavant/proof/packages ***
total 4
drwxr-xr-x 3 vacavant atlas 4096 Jul 9 17:24 CManager2
lrwxrwxrwx 1 vacavant atlas 37 Jul 9 17:24 CManager2.par -> /data/vacavant/laurent2/CManager2.par
*** Package cache marson.in2p3.fr :/data/proof/vacavant/packages ***
total 20
drwxr-xr-x 3 vacavant atlas 4096 Jul 9 17:24 CManager2
-rw-r–r-- 1 vacavant atlas 5482 Jul 9 17:24 CManager2.par
*** Package cache marwn43.in2p3.fr :/tmp/proof/atl189/packages ***
total 20
drwxr-xr-x 3 atl189 atlas 4096 Jul 9 17:24 CManager2
-rw-r–r-- 1 atl189 atlas 5482 Jul 9 17:24 CManager2.par
*** Enabled packages on slave 0.8 on marwn45.in2p3.fr
CManager2
*** Enabled packages on slave 0.9 on marwn51.in2p3.fr
CManager2
*** Enabled packages on slave 0.7 on marwn45.in2p3.fr
CManager2
*** Enabled packages on slave 0.6 on marwn45.in2p3.fr
CManager2
*** Enabled packages on slave 0.5 on marwn49.in2p3.fr
CManager2
*** Enabled packages on slave 0.4 on marwn43.in2p3.fr
CManager2
*** Enabled packages on slave 0.1 on marwn43.in2p3.fr
CManager2
*** Enabled packages on slave 0.2 on marwn43.in2p3.fr
CManager2
*** Enabled packages on master 0 on marson.in2p3.fr
CManager2
**CManager2
Connected to: localhost.localdomain (valid)
Port number: 1097
User: vacavant
ROOT version|rev: 5.18/00|r21744
Architecture-Compiler: linux-gcc346
Proofd protocol version: 16
Client protocol version: 16
Remote protocol version: 16
Log level: 0
Session unique tag: marson-1215616952-5929
Default data pool: root://marson.in2p3.fr//proofpool
*** Master server 0 (parallel mode, 8 workers):
Master host name: marson.in2p3.fr
Port number: 1097
User/Group: vacavant/default
ROOT version|rev|tag: 5.18/00|r21744|5.18/00
Architecture-Compiler: linux-gcc346
Protocol version: 16
Image name: marson.in2p3.fr :/data/proof/vacavant
Working directory: /data/proof/vacavant/session-marson-1215616952-5929/master-0-marson-1215616952-5929
Config directory:
Config file: proof.conf
Log level: 0
Number of workers: 10
Number of active workers: 8
Number of unique workers: 1
Number of inactive workers: 0
Number of bad workers: 2
Total MB’s processed: 0.00
Total real time used (s): 21.877
Total CPU time used (s): 0.930
List of workers:
*** Worker 0.1 (valid)
Host name: localhost.localdomain
Port number: 20001
Enabled packages on client on marson.in2p3.fr
ROOT version|rev|tag: 5.18/00|r21744|5.18/00
Architecture-Compiler: linux-gcc346
User/Group: atl189/default
Proofd protocol version: 16
Image name: localhost.localdomain:/tmp/proof/atl189
Working directory: /tmp/proof/atl189/session-marson-1215616952-5929/worker-0.1-marwn43-1215616953-20370
Performance index: 100
MB’s processed: 0.00
MB’s sent: 0.03
MB’s received: 0.10
Real time used (s): 5.338
CPU time used (s): 0.140
*** Worker 0.2 (valid)
Host name: localhost.localdomain
Port number: 20002
ROOT version|rev|tag: 5.18/00|r21744|5.18/00
Architecture-Compiler: linux-gcc346
User/Group: atl189/default
Proofd protocol version: 16
Image name: localhost.localdomain:/tmp/proof/atl189
Working directory: /tmp/proof/atl189/session-marson-1215616952-5929/worker-0.2-marwn43-1215616953-20374
Performance index: 100
MB’s processed: 0.00
MB’s sent: 0.03
MB’s received: 0.00
Real time used (s): 2.315
CPU time used (s): 0.140
*** Worker 0.4 (valid)
Host name: localhost.localdomain
Port number: 20004
ROOT version|rev|tag: 5.18/00|r21744|5.18/00
Architecture-Compiler: linux-gcc346
User/Group: atl189/default
Proofd protocol version: 16
Image name: localhost.localdomain:/tmp/proof/atl189
Working directory: /tmp/proof/atl189/session-marson-1215616952-5929/worker-0.4-marwn43-1215616953-20379
Performance index: 100
MB’s processed: 0.00
MB’s sent: 0.03
MB’s received: 0.00
Real time used (s): 2.250
CPU time used (s): 0.130
*** Worker 0.5 (valid)
Host name: localhost.localdomain
Port number: 20005
ROOT version|rev|tag: 5.18/00|r21744|5.18/00
Architecture-Compiler: linux-gcc346
User/Group: atl189/default
Proofd protocol version: 16
Image name: localhost.localdomain:/tmp/proof/atl189
Working directory: /tmp/proof/atl189/session-marson-1215616952-5929/worker-0.5-marwn49-1215616953-29544
Performance index: 100
MB’s processed: 0.00
MB’s sent: 0.03
MB’s received: 0.09
Real time used (s): 2.415
CPU time used (s): 0.100
*** Worker 0.6 (valid)
Host name: localhost.localdomain
Port number: 20006
ROOT version|rev|tag: 5.18/00|r21744|5.18/00
Architecture-Compiler: linux-gcc346
User/Group: atl189/default
Proofd protocol version: 16
Image name: localhost.localdomain:/tmp/proof/atl189
Working directory: /tmp/proof/atl189/session-marson-1215616952-5929/worker-0.6-marwn45-1215616953-30473
Performance index: 100
MB’s processed: 0.00
MB’s sent: 0.03
MB’s received: 0.09
Real time used (s): 2.556
CPU time used (s): 0.110
*** Worker 0.7 (valid)
Host name: localhost.localdomain
Port number: 20007
ROOT version|rev|tag: 5.18/00|r21744|5.18/00
Architecture-Compiler: linux-gcc346
User/Group: atl189/default
Proofd protocol version: 16
Image name: localhost.localdomain:/tmp/proof/atl189
Working directory: /tmp/proof/atl189/session-marson-1215616952-5929/worker-0.7-marwn45-1215616954-30477
Performance index: 100
MB’s processed: 0.00
MB’s sent: 0.03
MB’s received: 0.00
Real time used (s): 2.290
CPU time used (s): 0.110
*** Worker 0.8 (valid)
Host name: localhost.localdomain
Port number: 20008
ROOT version|rev|tag: 5.18/00|r21744|5.18/00
Architecture-Compiler: linux-gcc346
User/Group: atl189/default
Proofd protocol version: 16
Image name: localhost.localdomain:/tmp/proof/atl189
Working directory: /tmp/proof/atl189/session-marson-1215616952-5929/worker-0.8-marwn45-1215616954-30482
Performance index: 100
MB’s processed: 0.00
MB’s sent: 0.03
MB’s received: 0.00
Real time used (s): 2.291
CPU time used (s): 0.110
*** Worker 0.9 (valid)
Host name: localhost.localdomain
Port number: 20009
ROOT version|rev|tag: 5.18/00|r21744|5.18/00
Architecture-Compiler: linux-gcc346
User/Group: atl189/default
Proofd protocol version: 16
Image name: localhost.localdomain:/tmp/proof/atl189
Working directory: /tmp/proof/atl189/session-marson-1215616952-5929/worker-0.9-marwn51-1215616954-22420
Performance index: 100
MB’s processed: 0.00
MB’s sent: 0.03
MB’s received: 0.09
Real time used (s): 2.422
CPU time used (s): 0.090
Cheers
Karim
anar
July 21, 2008, 9:28am
4
FYI, Karim and I are tracking this issue in gLitePROOF Trac as well, to find out whether it is a gLitePROOF or PROOF issue.
So far I didn’t find any side effects of gLitePROOF here.
subversion.gsi.de/trac/dgrid/ticket/75
karim
July 21, 2008, 11:31am
5
I had set in the xpdf.cf file
xpd.workdir /tmp/proof
Following Anar’s advice, I had tried without this line and it works fine now. The packages are now put in each job directory on the worker …
Karim
anar
July 21, 2008, 2:14pm
6
Just a bit details from my side.
Actually this is somehow related to the gLitePROOF.
When using gLitePROOF on the Grid (gLite) and setting PROOF workers’ working directory to a “tmp”, for example, then image names of the workers will look-alike:
Image names - is a criteria, which PROOF uses when it looks for unique workers to upload packages.
Since gLitePROOF emulates an environment for PROOF to let it run across the Grid and forces PROOF Master to think that workers are located on the same machine.
In this case PROOF thinks that these workers are shares the same image and will upload a package only once.
A workaround is to use default settings of gLitePROOF.
By default gLitePROOF sets the PROOF working directory to user’s home.
On gLite Grid it looks like:
Home Dir. Of the user + gLite JobID
and it is always different directory path, even if the home is shared between workers, since JobIDs are different.
And worker’s image could look like:
In this case every gLitePROOF worker is a unique worker for PROOF.