Proof + proofd under 5.16

videbaek · August 24, 2007, 9:52pm

Hi,

Our group just upgraded to root 5.16/00 as part of a general OS upgrade at RCF (at BNL) . Previously we have used proof (proofd,… not xrootd) under 5.11… (SL4)
Proof is now up and running again, but for each of the 'proof masters’
a peculiar pattern appears when the session executes.

i) The specified #slaves do start up e.g. 4/10/ 14. The proofserve /slaves starts up
ii) the selector class is compiled (see rootcint,…) on all slaves
iii) when reading the TDset ->Process() only TWO slaves are actually processing events. It seems the master is only distributing events to two specific slaves. All proof slave processes are present on nodes, but with no cpu time usage.
iv) the session terminates with the proper result.

s Print() show that the correct number of slaves are abvailable; no bad ones or inactive.

Can you give us some hints what to look for?
Any additional information that would be useful ?

We are reluctant at this point to invest the efforts to have xrootd up and running with the framework, if it can be avoided in order not to disrupt current analysis.

ganis · August 29, 2007, 3:20pm

Hello,

Are these workers all on the same machine?

Yes
A possible explanation is the fact that the new default packetizer (TAdaptivePacketizer) has an upper limit on the number of workers on each machine set - by default - to 2 (the default will be changed to the number of processors).
You have two options:
i. use TPacketizer

root[0] = TProof *p = TProof::Open("master")
root[1] = p->SetParameter("PROOF_Packetizer", "TPacketizer");

ii. remove the limit

root[0] = TProof *p = TProof::Open("master")
root[1] = p->SetParameter("PROOF_MaxSlavesPerNode", 9999);

   being aware that you will not get much improvement by having more workers than the number of processors.

No
This will be strange: You can try with solution i.) above and see if you get any effect; but we would have to find a way to understand what’s going on.

G. Ganis

videbaek · August 29, 2007, 5:34pm

Hi

Thanks for your response.

In fact I had tried multiple configurations
i) 7 nodes 2 workers (dual processors)
ii) 3 nodes 4-4-2 workers (quad processors)
iii) 1 node 4 workers
and in all cases only 2 processes were getting cpu time; though all were alive and had compiled as said.

The suggested
p->SetParameter(“PROOF_MaxSlavesPerNode”, 9999);
did though fix the issue as far I can tell. So far I tested case
iii) and ii) and the appropriate #slaves processed data.

I will check some of the other configurations and par settings for completeness and keep you informed.

ganis · August 29, 2007, 9:19pm

Hi,

Can you try with TPacketizer?

Also, in case ii) are the workers chosen randomly between the 3 nodes or is there a pattern (e.g. always the first 2) ?

Thanks,

G. Ganis

videbaek · August 30, 2007, 2:46pm

turning PROOF_Packetizer on by it self - only results in workers being idle.
i.e. it looks like not setting any of these parameters.

adding the MaxSlavesPerNode ,9999 together makes all workers process data.

I investigated more systematically if there is a pattern in what workres are active. I did this by using our 4 processor node in configuration
4-4-4 (4 workers on machine 1)
4-4-3 (3 on last, 1 on one)
4-4-2 (2 on last 2 on first)
4-4-1 (1 on last 3 on second)

tentatively I would say the it looks like the number of active workers is the same as #cpu in the nodes, while the machnies picked, could be
determined by who finises the compilation of the selector code first ?!

jani · September 3, 2007, 10:46am

Hi

Thanks for sending us the test results. I understand that changing the MaxSlavesPerNode fixes the problem. However, thanks to your feedback, we are motivated to change the default settings. By default, the limit on #workers reading from one file server is optimized for single hard disk machines and I/O bound queries. In general, MaxSlavesPerNode should be equal to

for I/O bound queries: 2 * number of hard disks (or equivalent) on 1 node
for CPU bound queries: number of CPUs.

What would be the optimal rule of assigning workers in your case? I understand that in your configuration you want to have mixed dual- and quad-core machines.

What do you mean by

[quote]turning PROOF_Packetizer on by it self - only results in workers being idle.
i.e. it looks like not setting any of these parameters.
[/quote]
?

Packetizer parameters are described here:
root.cern.ch/twiki/bin/view/ROOT/ProofParameters

[quote]tentatively I would say the it looks like the number of active workers is the same as #cpu in the nodes, while the machnies picked, could be
determined by who finises the compilation of the selector code first ?![/quote]

Yes

Cheers,
Jan

videbaek · September 4, 2007, 5:43pm

Thanks for following up on this.

what I meant by the packetizer; was to follow the 1) suggestion by Ganis
setting…
root[1] = p->SetParameter(“PROOF_Packetizer”, “TPacketizer”);
which did not make it work.
on the configuration: The RCF facility has a mixture of dual and quad machines. At our experiment level we configure the master-slaves, so it will be fairly simple to keep all the quad’s together and duals together.
Since our jobs are in fact mostly cpu limited having the defaults == # cpu’s
seems reasonable.
As additional information in our configuration the tree’s being analyzed are in general on nfs mounted systems (was panasas, moving to bluearc).

jani · October 11, 2007, 1:12pm

Hi,

As disscussed before, the default limit is changed to #CPU cores in the svn trunk.
Also other improvements for multicore machines were done in the default packetizer. Is it performing well in your case?

Should you have more coments on PROOF performance, please let us know.

Cheers.
Jan