Only max 4 worker seem to work

Otto_Schaile · November 26, 2007, 2:28pm

Dear rooters,
we try to set max number of workers as follows:

xpd.resource static /home/cluster/uh351ac/proof_o/config/proof.conf wmx:-1 selopt:roundrobin

or:
xpd.resource static /home/cluster/uh351ac/proof_o/config/proof.conf wmx:8 selopt:random

It seems only 4 workers really do something (top) though the proofservers are
running where expected. (proof.conf has 8 workers)

Cheers
Otto

jani · November 26, 2007, 9:19pm

Hi Otto,

Maybe you have 4 CPU cores on the machne?
The number of proofserv being alive (ps) is 8?

There is a limittation on the max. number of workers reading remotely from one file node (worker machine). By default it’s set to the number of CPU cores of the master node.

You can change the limit after starting a PROOF session:

root[0] = TProof *p = TProof::Open("master")
root[1] = p->SetParameter("PROOF_MaxSlavesPerNode", 8);

The point of this limit is to avoid overloading the I/O on the workers.

See also:
root.cern.ch/phpBB2/viewtopic.php?t=5359

Cheers,
Jan

pfernandez · February 7, 2008, 10:04am

Hi,

I’m having the same problem, I add a 8-core machine eight times in the proof.conf file but only two of them are working at the same time (and always the same two) according to “top”. Also, the eight processes are alive, but most sleeping.

All the jobs are getting files from a dcache-dcap-door which then redirects the connection to the pools. So, no limit should be imposed on the number of connections to the same host.

The p->SetParameter(“PROOF_MaxSlavesPerNode”, 8 ); method does not work for me.
When oppening the connection I get:
Opening connections to workers: OK (8 workers)
Setting up worker servers: OK (8 workers)
PROOF set to parallel mode (8 workers)
PROOF set to parallel mode (8 workers)
Looking up for exact location of files: OK (42 files)

There’s no special statement changing the number of jobs/machine, it’s default to the number of cores. When proof starts in the worker it says:
— Proofd: : GetNumCPUs: # of cores found: 8

Maybe I’m having a different problem?

Is there any other way to adjust the max number of jobs against the same remote server than changing the source analysis code? I don’t think users should be aware, when writing code, of how many cores/machine does the cluster have. Also, what happens if some workers have 4 cores, some others 8 and so on?

Thanks,
Pablo

jani · February 7, 2008, 1:19pm

Hi Pablo,

Strange that it does not work. So after calling p->SetParameter(“PROOF_MaxSlavesPerNode”, 8 ), you see only 2 workers active?!
Are you sure, that when you change the parameter PROOF_MaxSlavesPerNode, ‘p’ is the right PROOF session? Please double check and let us know.
Anyway, the parameter is not meant to change the number of processes nor the way the session is started but only to limit the number of remote processes reading from one file node at the same time. By default it’s set to the number of CPU cores on the master node.

Good point. I will add a new directive in the config file to change PROOF_MaxSlavesPerNode and let you know. Managing nonuniform clusters in on our TODO list.

[quote]
Thanks,
Pablo[/quote]

Cheers
Jan

pfernandez · February 7, 2008, 3:54pm

Ok, now I’ve seen the other topic about this, and realized that setting the MaxSlavesPerNode to 8 is not enough, you have to change also the packetizer with:
p->SetParameter(“PROOF_Packetizer”, “TPacketizer”);

With this I manage to increase the number of working processes to 4, but it should be 8.
This is the code (recently created root session, no other proof sessions):

gROOT->Macro(“makechain.C”);
TProof::Open(“user@hepdc1”);
EV0->SetProof();
gProof->SetParameter(“PROOF_Packetizer”, “TPacketizer”);
gProof->SetParameter(“PROOF_MaxSlavesPerNode”, 8 );
EV0->Process(“EV0.C”);

Any ideas?
Does changing the packetizer have any drawback?

Thanks,
Pablo

pfernandez · February 8, 2008, 6:40am

I’ve found a reason for having just 4 jobs (and before having just two).

Taken from root.cern.ch/twiki/bin/view/ROOT/ProofParameters we can see that each packetizer has a default value in option “PROOF_MaxSlavesPerNode”, exactly 2 for the default one and 4 with TPacketizer. So, this matches my case exactly.

But then, why does proof ignore my SetParameter(“PROOF_MaxSlavesPerNode”, 8 )?

Also, I’ve tried changing to “xpd.localwrks 8” in the admin node and it does not change anything.

Thanks,
Pablo

jani · February 8, 2008, 9:38am

Hi Pablo,

Sorry for a delay.
Probably, the reason is that the parameter must be of type long. Try:

gProof->SetParameter("PROOF_MaxSlavesPerNode", (Long_t)8).
We will either change it to int or fix the documentation.
TPacketizer is the old packetizer and you don’t need to use it in order to utilise all your CPU cores. The “PROOF_MaxSlavesPerNode” parameter works with both packetizers.

Cheers,
Jan

pfernandez · February 8, 2008, 10:16am

Yes, it works now!
Thanks a lot for the help.

Will this option be included in the xpd.cf file in the next root release?

One more comment, I don’t think it’s worth another thread:
I’ve tried in a 8-core machine (with remote dcap files) and with 8 jobs I get 14300 ev/s. (almost lineal with less cpu’s). I saw that the jobs were not 100% cpu busy, and increasing to 12 jobs gave me 15400 ev/s. Interesting, right?

jani · February 10, 2008, 12:39am

Hi Pablo,

The option is now in SVN trunk and dev/proof branch so it will be included in the next ROOT release. The max workers per node is configurable in .rootrc (Packetizer.MaxWorkersPerNode: ) or in the xrootd config file by

xpd.putrc Packetizer.MaxWorkersPerNode: <desired number>

Also the PROOF_MaxSlavesPerNode paramete type is changed (Long_t to Int_t).

For the question on proc rates:

Is it systematic?
The analysis is probably I/O bound so CPU is not fully utilized.
When you repeat the same query, part of the data is cached in the linux buffer cache and it makes the following queries shorter. It’s in fact very hard to avoid the “cache” effect in performance tests. You have to run another query which will use the entire memory of the file-serving nodes before repeating the original query with different parameters.
Also it may happen that for the query you run, it’s faster with 12 workers than with 8. For instance the packet sizes are changed. If you continue to see problems with performance, please let us know. There is an internal stat package in PROOF - TPerfStats - to analyze them.

Cheers,
Jan

pfernandez · February 11, 2008, 9:53am

Hi,

Great, thanks, let’s then wait for next release
For the performance topic, yes, it’s because IO is not fast enough to make the cpu busy.
As far as I know, there’s no linux cache on remote transfers.
Testing files locally the limit is the number of cores. It’s horrible to test without cache, I had to reboot the linux box after every test!!

BR/Pablo

jani · February 11, 2008, 4:05pm

[quote]For the performance topic, yes, it’s because IO is not fast enough to make the cpu busy.
As far as I know, there’s no linux cache on remote transfers.
[/quote]
I meant the buffer cache on the node that serves the file. I guess that in your case it would be a dCache node but I haven’t used dCache.

Can you explain? “The limit” of what? You mean that locally the processing is a CPU-bound task? Without which “cache”?

Cheers,
Jan
[/quote]

pfernandez · February 11, 2008, 4:57pm

Well, in our case dcache consists of pools with 13 ultra-fast raid6 disks with xfs and the latency is really low. I’ve make the same test from one day to the next (so the files should be out-cached) and the performance is almost the same.

Yes, sorry. What I meant is that testing with files stored locally, and cached by the OS, IS a CPU-bound process (and “The Limit” is the best number of processes = the number of cores). If the files are not cached the performance is really horrible, at least with a single-disk setup. I guess it will be better with any kind of RAID, but for me after the 4th parallel job the performance started to drop very badly.

Wolf · April 17, 2008, 4:07pm

Hello,

I’ve got a similar problem. I have startet xrootd (5.19) on three 8 core machines and want to run PROOF more than 8 workers (5.18). Maybe 10, 16, 24 or even more later. When logging on to PROOF, as many proofserv.exe’s as given in the proof.conf are started - but if I do a simple chainSetProof();chain.Draw(…) semms to interact with 8 workers only.

In the PROOF_EventsHist histogram, I can see that the names of all workers start “0.” (0.0 to 0.15 for example). And 8 random workers actually do the work. Usually, the 8 workers are not taken from the same machine, so in principle, PROOF works.
Shouldn’t the workers be named 0.0 to 0.x, 1.0 to 1.y and 2.0 to 2.z if I have three machines?

Anyway, I tried to set p->SetParameter(“PROOF_MaxSlavesPerNode”, 100); and have added xpd.putrc Packetizer.MaxWorkersPerNode but somehow this shows absolutely no effect. I can set it to 1 or to 100, doen’t matter - I get 8 workers. I guess there must be something wrong/missing in my configuration.

Any idea what I am doing wrong? Or any idea how I can find out what is wrong?

Thanks in advance
Wolf

anar · April 17, 2008, 4:13pm

[quote=“Wolf”]
Anyway, I tried to set p->SetParameter(“PROOF_MaxSlavesPerNode”, 100); and have added xpd.putrc Packetizer.MaxWorkersPerNode but somehow this shows absolutely no effect. I can set it to 1 or to 100, doen’t matter - I get 8 workers. I guess there must be something wrong/missing in my configuration.

Any idea what I am doing wrong? Or any idea how I can find out what is wrong?
Wolf[/quote]
please try the following:

Wolf · April 18, 2008, 10:22am

[quote=“anar”]
please try the following:

Hm, strange. I had this in my proof start script (I had tried with and without Long_t because I thought this had been changed to int in 5.19). Does not seem to have any effect! Can I set this parameter at any time after the “TProof *gProof = TProof::Open(master);”?

Now I think it has to do with version incompatibilities:
server = 5.19.02
PROOF started via root 5.18.00a -> I get 8 working workers
PROOF started via root 5.19.02 -> I get 20 working workers (Packetizer.MaxWorkersPerNode)

Strange… And why doesn’t the MaxSlavesPerNode have any effect?

Now I’ve tried 5.18 only - still, p->SetParameter(“PROOF_MaxSlavesPerNode”, (Long_t)100); does not work. Am I using this command in the right way? (1. Logon to Proof; 2. proof->SetParameter; 3. chain.Process)

Edit: — very very strange: today, I tried again and it works! Must have been something strange.