I have a cluster of machines running PROOF, all of them seem to be running ok and run simple commands like gProof->Exec(".!uname -a"). But whenever I try to run my analysis, only 2 are used, even when I have 10 of them avaiable. The memory plots only show activity on 2 of them.
I forced to use all the workers setting
<max_workers> Maximum number of workers to be assigned to user
session [-1, i.e. all]
to wmx:-1, but it is not working.
Is there any way to force all the workers to be used? Or any place where I can debug why only 2 workers are used?
Thanks,
Ana Rodríguez.
The only reason I could think is a that all the workers are reading from the same server. There is, in such a case, a limitation on the number of workers accessing the server. The limitation can be lifted by setting:
I tried the gProof statement you suggested but it did not solve anything.
I am using ROOT Version 5.25/02 29 September 2009.
I start the xroot workers through a SGE batch system, the master location is fixed in another machine. All of them run SL5 64-bits.
They are 8 cores machines with 16 GB RAM, usually I get 1 core per machine as a worker.
I have tried reading from 1 to 5 files, each of them ~ 11 GB with ~ 2.e6 events.
The workers have access to the files through GPFS.
Always just 3 workers (1 master, 2 workers) are used.
The “show logs” shows “// # of retrieved lines: 0” for all the other available workers, and a number greater than 0 for the others.
root [0] TProof p = TProof::Open(“arodrig@proof.ifca.es:1093”)
Starting master: opening connection …
Starting master: OK
Opening connections to workers: OK (10 workers)
Setting up worker servers: OK (10 workers)
PROOF set to parallel mode (10 workers)
root [1] gProof->SetParameter(“Packetizer.MaxWorkersPerNode”, 9999)
root [2] TDSet set = new TDSet(“TTree”, “Tree”)
root [3] set->Add("/gpfs/csic_projects/cms/PROOF_data/data/minitree_Wjets_IC.root")
(Bool_t)1
root [4] p->Process(set,“myselector.C”)
Looking up for exact location of files: OK (1 files)
Looking up for exact location of files: OK (1 files)
Validating files: OK (1 files)
Mst-0: merging output objects … / (3 workers still sending)
Ok, I still believe that it should come form the limitation that I was mentioning, but by mistake I swapped the name of the related rootrc env and of the parameter to be used in SetParameter. Sorry.