Problem with UploadPackage

Hi

Using gliteProof/Proof, I have a problem with UploadPackage
{
TDSet t = new TDSet( “TTree”,“vtuple” );
t->Add(“root://marsedpm.in2p3.fr//dpm/mrs.grid.cnrs.fr/home/atlas/cppm/vacavant/vt-mytt-pre14210_rel5Jun27-csc010200-NT.r
oot”);
TProof
p = TProof::Open(“localhost:1097”);
p->SetParameter(“PROOF_MaxSlavesPerNode”,(Long_t) 20);
p->ClearPackages();
p->UploadPackage(“CManager2.par”, TProof::kRemoveOld);
p->EnablePackage(“CManager2”);
p->TProof::ShowPackages (kTRUE);
t->Process(“VTSelector.C+”);
}

the CManager2 package is put in
/tmp/proof/atl189/packages/
on the worker nodes

but on some of them it is not there and TProof::ShowPackages says

*** Package cache client:/atlas/vacavant/proof/packages ***
total 4
drwxr-xr-x 3 vacavant atlas 4096 Jul 8 13:27 CManager2
lrwxrwxrwx 1 vacavant atlas 37 Jul 8 13:27 CManager2.par -> /data/vacavant/laurent2/CManager2.par
*** Package cache marson.in2p3.fr:/data/proof/vacavant/packages ***
total 20
drwxr-xr-x 3 vacavant atlas 4096 Jul 8 13:27 CManager2
-rw-r–r-- 1 vacavant atlas 5482 Jul 8 13:27 CManager2.par
*** Package cache marwn44.in2p3.fr:/tmp/proof/atl189/packages ***
total 20
drwxr-xr-x 3 atl189 atlas 4096 Jul 8 13:27 CManager2
-rw-r–r-- 1 atl189 atlas 5482 Jul 8 13:27 CManager2.par

it is only found on one worker node, I do not understand why.

Thanks for any hint

Karim

Dear Karim,

I am not sure to understand the problem. How many workers do you expect to have?
From the output of ShowPackages(kTRUE) that you posted you seem to have only one worker on marwn44.in2p3.fr and the package is correctly uploaded there.

Can you give more details about your configuration?
Also, can you post the output of p->Print(“a”) ?

Gerri Ganis

Hi Gerry

I am using gLiteProof with root 5.18.
I was using 8 workers
below the ouput of

p->TProof::ShowPackages (kTRUE);
p->TProof::ShowEnabledPackages (kTRUE);
p->Print (“a”);


PROOF set to parallel mode (8 workers)
(int)0
*** Package cache client:/atlas/vacavant/proof/packages ***
total 4
drwxr-xr-x 3 vacavant atlas 4096 Jul 9 17:24 CManager2
lrwxrwxrwx 1 vacavant atlas 37 Jul 9 17:24 CManager2.par -> /data/vacavant/laurent2/CManager2.par
*** Package cache marson.in2p3.fr:/data/proof/vacavant/packages ***
total 20
drwxr-xr-x 3 vacavant atlas 4096 Jul 9 17:24 CManager2
-rw-r–r-- 1 vacavant atlas 5482 Jul 9 17:24 CManager2.par
*** Package cache marwn43.in2p3.fr:/tmp/proof/atl189/packages ***
total 20
drwxr-xr-x 3 atl189 atlas 4096 Jul 9 17:24 CManager2
-rw-r–r-- 1 atl189 atlas 5482 Jul 9 17:24 CManager2.par
*** Enabled packages on slave 0.8 on marwn45.in2p3.fr
CManager2
*** Enabled packages on slave 0.9 on marwn51.in2p3.fr
CManager2
*** Enabled packages on slave 0.7 on marwn45.in2p3.fr
CManager2
*** Enabled packages on slave 0.6 on marwn45.in2p3.fr
CManager2
*** Enabled packages on slave 0.5 on marwn49.in2p3.fr
CManager2
*** Enabled packages on slave 0.4 on marwn43.in2p3.fr
CManager2
*** Enabled packages on slave 0.1 on marwn43.in2p3.fr
CManager2
*** Enabled packages on slave 0.2 on marwn43.in2p3.fr
CManager2
*** Enabled packages on master 0 on marson.in2p3.fr
CManager2
**CManager2
Connected to: localhost.localdomain (valid)
Port number: 1097
User: vacavant
ROOT version|rev: 5.18/00|r21744
Architecture-Compiler: linux-gcc346
Proofd protocol version: 16
Client protocol version: 16
Remote protocol version: 16
Log level: 0
Session unique tag: marson-1215616952-5929
Default data pool: root://marson.in2p3.fr//proofpool
*** Master server 0 (parallel mode, 8 workers):
Master host name: marson.in2p3.fr
Port number: 1097
User/Group: vacavant/default
ROOT version|rev|tag: 5.18/00|r21744|5.18/00
Architecture-Compiler: linux-gcc346
Protocol version: 16
Image name: marson.in2p3.fr:/data/proof/vacavant
Working directory: /data/proof/vacavant/session-marson-1215616952-5929/master-0-marson-1215616952-5929
Config directory:
Config file: proof.conf
Log level: 0
Number of workers: 10
Number of active workers: 8
Number of unique workers: 1
Number of inactive workers: 0
Number of bad workers: 2
Total MB’s processed: 0.00
Total real time used (s): 21.877
Total CPU time used (s): 0.930
List of workers:
*** Worker 0.1 (valid)
Host name: localhost.localdomain
Port number: 20001

  • Enabled packages on client on marson.in2p3.fr
    ROOT version|rev|tag: 5.18/00|r21744|5.18/00
    Architecture-Compiler: linux-gcc346
    User/Group: atl189/default
    Proofd protocol version: 16
    Image name: localhost.localdomain:/tmp/proof/atl189
    Working directory: /tmp/proof/atl189/session-marson-1215616952-5929/worker-0.1-marwn43-1215616953-20370
    Performance index: 100
    MB’s processed: 0.00
    MB’s sent: 0.03
    MB’s received: 0.10
    Real time used (s): 5.338
    CPU time used (s): 0.140
    *** Worker 0.2 (valid)
    Host name: localhost.localdomain
    Port number: 20002
    ROOT version|rev|tag: 5.18/00|r21744|5.18/00
    Architecture-Compiler: linux-gcc346
    User/Group: atl189/default
    Proofd protocol version: 16
    Image name: localhost.localdomain:/tmp/proof/atl189
    Working directory: /tmp/proof/atl189/session-marson-1215616952-5929/worker-0.2-marwn43-1215616953-20374
    Performance index: 100
    MB’s processed: 0.00
    MB’s sent: 0.03
    MB’s received: 0.00
    Real time used (s): 2.315
    CPU time used (s): 0.140
    *** Worker 0.4 (valid)
    Host name: localhost.localdomain
    Port number: 20004
    ROOT version|rev|tag: 5.18/00|r21744|5.18/00
    Architecture-Compiler: linux-gcc346
    User/Group: atl189/default
    Proofd protocol version: 16
    Image name: localhost.localdomain:/tmp/proof/atl189
    Working directory: /tmp/proof/atl189/session-marson-1215616952-5929/worker-0.4-marwn43-1215616953-20379
    Performance index: 100
    MB’s processed: 0.00
    MB’s sent: 0.03
    MB’s received: 0.00
    Real time used (s): 2.250
    CPU time used (s): 0.130
    *** Worker 0.5 (valid)
    Host name: localhost.localdomain
    Port number: 20005
    ROOT version|rev|tag: 5.18/00|r21744|5.18/00
    Architecture-Compiler: linux-gcc346
    User/Group: atl189/default
    Proofd protocol version: 16
    Image name: localhost.localdomain:/tmp/proof/atl189
    Working directory: /tmp/proof/atl189/session-marson-1215616952-5929/worker-0.5-marwn49-1215616953-29544
    Performance index: 100
    MB’s processed: 0.00
    MB’s sent: 0.03
    MB’s received: 0.09
    Real time used (s): 2.415
    CPU time used (s): 0.100
    *** Worker 0.6 (valid)
    Host name: localhost.localdomain
    Port number: 20006
    ROOT version|rev|tag: 5.18/00|r21744|5.18/00
    Architecture-Compiler: linux-gcc346
    User/Group: atl189/default
    Proofd protocol version: 16
    Image name: localhost.localdomain:/tmp/proof/atl189
    Working directory: /tmp/proof/atl189/session-marson-1215616952-5929/worker-0.6-marwn45-1215616953-30473
    Performance index: 100
    MB’s processed: 0.00
    MB’s sent: 0.03
    MB’s received: 0.09
    Real time used (s): 2.556
    CPU time used (s): 0.110
    *** Worker 0.7 (valid)
    Host name: localhost.localdomain
    Port number: 20007
    ROOT version|rev|tag: 5.18/00|r21744|5.18/00
    Architecture-Compiler: linux-gcc346
    User/Group: atl189/default
    Proofd protocol version: 16
    Image name: localhost.localdomain:/tmp/proof/atl189
    Working directory: /tmp/proof/atl189/session-marson-1215616952-5929/worker-0.7-marwn45-1215616954-30477
    Performance index: 100
    MB’s processed: 0.00
    MB’s sent: 0.03
    MB’s received: 0.00
    Real time used (s): 2.290
    CPU time used (s): 0.110
    *** Worker 0.8 (valid)
    Host name: localhost.localdomain
    Port number: 20008
    ROOT version|rev|tag: 5.18/00|r21744|5.18/00
    Architecture-Compiler: linux-gcc346
    User/Group: atl189/default
    Proofd protocol version: 16
    Image name: localhost.localdomain:/tmp/proof/atl189
    Working directory: /tmp/proof/atl189/session-marson-1215616952-5929/worker-0.8-marwn45-1215616954-30482
    Performance index: 100
    MB’s processed: 0.00
    MB’s sent: 0.03
    MB’s received: 0.00
    Real time used (s): 2.291
    CPU time used (s): 0.110
    *** Worker 0.9 (valid)
    Host name: localhost.localdomain
    Port number: 20009
    ROOT version|rev|tag: 5.18/00|r21744|5.18/00
    Architecture-Compiler: linux-gcc346
    User/Group: atl189/default
    Proofd protocol version: 16
    Image name: localhost.localdomain:/tmp/proof/atl189
    Working directory: /tmp/proof/atl189/session-marson-1215616952-5929/worker-0.9-marwn51-1215616954-22420
    Performance index: 100
    MB’s processed: 0.00
    MB’s sent: 0.03
    MB’s received: 0.09
    Real time used (s): 2.422
    CPU time used (s): 0.090

Cheers

Karim

FYI, Karim and I are tracking this issue in gLitePROOF Trac as well, to find out whether it is a gLitePROOF or PROOF issue.
So far I didn’t find any side effects of gLitePROOF here.

subversion.gsi.de/trac/dgrid/ticket/75

I had set in the xpdf.cf file

xpd.workdir /tmp/proof

Following Anar’s advice, I had tried without this line and it works fine now. The packages are now put in each job directory on the worker …

Karim

Just a bit details from my side.
Actually this is somehow related to the gLitePROOF.
When using gLitePROOF on the Grid (gLite) and setting PROOF workers’ working directory to a “tmp”, for example, then image names of the workers will look-alike:

Image names - is a criteria, which PROOF uses when it looks for unique workers to upload packages.
Since gLitePROOF emulates an environment for PROOF to let it run across the Grid and forces PROOF Master to think that workers are located on the same machine.
In this case PROOF thinks that these workers are shares the same image and will upload a package only once.

A workaround is to use default settings of gLitePROOF.
By default gLitePROOF sets the PROOF working directory to user’s home.
On gLite Grid it looks like:
Home Dir. Of the user + gLite JobID
and it is always different directory path, even if the home is shared between workers, since JobIDs are different.
And worker’s image could look like:

In this case every gLitePROOF worker is a unique worker for PROOF.