Controlling the number of running sessions concurrently

Hi all!

I have PROOF cluster: 1 master and 2 workers with 3 CPU on every worker.

When I running one job (simple infinite loop) everything’s ok.
But while job is running all next jobs are switched to asynchronous mode.

I got next message:

Info in <TProof::Process>: session is in waiting or processing status: switch to asynchronous mode
p->ShowQueries()
+++
+++ Queries processed during this session: selector: 4, draw: 0
+++ #:1 ref:"session-vobox0004-1268744842-23687:q1" sel:ProofSimple running   evts:0-9
+++ #:2 ref:"session-vobox0004-1268744842-23687:q2" sel:EV0 submitted evts:0-999999999
+++ #:3 ref:"session-vobox0004-1268744842-23687:q3" sel:ProofSimple submitted evts:0-9
+++ #:4 ref:"session-vobox0004-1268744842-23687:q4" sel:ProofSimple submitted evts:0-9

And I can not execute few jobs concurrently

In my xpd.cf (configuration file) on workers machine I have next strings:

xpd.putrc Proof.DynamicStartup 1
xpd.schedparam queue:fifo mxrun:5

But it does not help.
What do I wrong?

Best regards!

Dear Kobla,

The queries submitted within a given user session are always run sequentially.
If you want to run them concurrently you have to start more sessions for the same user:

root [] TProof *p1 = TProof::Open("<master>")
root [] TProof *p2 = TProof::Open("<master>/?N")

or use different user names.

How do you submit your queries to have them queued?

G. Ganis

This is example how I submit my jobs:

root [0] TProof *p1 = TProof::Open("xrootd@vobox0004xxx")
Starting master: opening connection ...
Starting master: OK
Opening connections to workers: OK (6 workers)
Setting up worker servers: OK (6 workers)
PROOF set to parallel mode (6 workers)
root [1] TProof *p2 = TProof::Open("xrootd@vobox0004xxx")
root [2] p1.Process("ProofSimple.C+", 10)

ProofSimple.C - is Infinite loop.
Then I switch job in background mode.

Info in <TProofPlayerRemote::Process>: switching to the asynchronous mode ...
(Long64_t)1

Then I submited second job.

root [3] p2.Process("ProofSimple.C+", 10)
Info in <TProof::Process>: session is in waiting or processing status: switch to asynchronous mode
(Long64_t)2
root [4] p1->ShowQueries()

11:14:12  2009 Mst-0 | Info in <TXProofServ::SetQueryRunning>: starting query: 1
11:14:12  2009 Mst-0 | Info in <TXProofServ::HandleInput>: kXPD_clusterinfo: tot: 1, act: 1, eff: 1.000000
11:14:12  2009 Mst-0 | SvcMsg in <TProofPlayerRemote::Process>: Start merging Memory information
11:14:28  2009 Mst-0 | Info in <TXProofServ::HandleProcess>: query "session-vobox0004-1268813400-2009:q2" submitted
+++
+++ Queries processed during this session: selector: 2, draw: 0
+++ #:1 ref:"session-vobox0004-1268813400-2009:q1" sel:ProofSimple running   evts:0-9
+++ #:2 ref:"session-vobox0004-1268813400-2009:q2" sel:ProofSimple submitted evts:0-9
+++
root [5]

Second job was switched to asynchronous mode and not run concurrently.

Hi,

In your case ‘p1’ and ‘p2’ are exactly the same session (you did not use the option ‘N’ in the opening URL, so the second TProof::Open does an attach to the existing session). So, when you submit via ‘p2’ you are just queuing the queries.

But, why is ProofSimple going into infinite loop? Did you modify it on purpose?

G. Ganis

Yes, I modified ProofSimple.C for my purpose.
Thanks a lot for comments.

I have one more question.
I’d like enable only one session on one node.
In my configure file xpd.cf on every node I have string:

xpd.putrc Proof.DynamicStartup 1
xpd.schedparam queue:fifo mxrun:1

But anyway some sessions were started on every node.

Hi,

Not completely sure to understand: would you like to start a session that has one worker per node?
The directives that you quote control the number of sessions and not the number and location of workers in the session (btw: they are active only on the master; worker nodes ignore them). In particular they set the number of concurrently running sessions to be 1.
The workers started for each session are those defined via proof.conf or via the xpd.worker directives. There you should specify one worker per node, if you want so.

How do your proof.conf-like file or your xpd.worker directives look like?

G. Ganis

I’ll try to explain what I need to do.

For example I have proof cluster with 1 master and 2 nodes.
Every node has 1 worker.

proof.conf on every node:

master voboxXXX

worker localhost

proof.conf on master:

master voboxXXX

worker wnXXX
worker wnXXX

xpd.cf on master:

xpd.putrc Proof.DynamicStartup 1
xpd.schedparam queue:fifo mxrun:1

I start session from another machine:

root [0] TProof *p1 = TProof::Open("xrootd@voboxXXX")
Starting master: opening connection ...
Starting master: OK
Opening connections to workers: OK (2 workers)
Setting up worker servers: OK (2 workers)
PROOF set to parallel mode (2 workers)

Then I start infinite loop in background mode. (I modified ProofSimple.C)

root [1] p1.Process("ProofSimple.C+", 10)
Info in <TProofPlayerRemote::Process>: switching to the asynchronous mode ...
(Long64_t)1

Command top on my nodes show me only one process:

11148 xrootd    25   0  194m  27m  15m R 99.9  0.3   0:27.06 proofserv.exe

Then I start second session.

root [2] TProof *p2 = TProof::Open("xrootd@voboxXXX/?N")
Starting master: opening connection ...
Starting master: OK
Opening connections to workers: OK (2 workers)
Setting up worker servers: OK (2 workers)
PROOF set to parallel mode (2 workers)

Then I start infinite loop in background mode.

root [3] p2.Process("ProofSimple.C+", 10)
Info in <ACLiC>: unmodified script has already been compiled and loaded
Info in <TProofPlayerRemote::Process>: switching to the asynchronous mode ...
(Long64_t)1

Command top on my nodes show me two processes:

11966 xrootd    25   0  193m  27m  15m R 94.0  0.3   2:20.62 proofserv.exe
11148 xrootd    25   0  194m  28m  15m R 93.8  0.4  12:48.19 proofserv.exe

And so on. If I start infinite loop I can see another additioanal process.
But in this case I can get too many processes on nodes, if many users start many proccesses.
How can I limit number of processes on the node that other processes stay in queue?

I restarted my config file on master and proof has worked correctly.

Thanks ganis for your help.