** EVENTS SKIPPED ** when running with few workers

Hello all,

I did several successful analysis using N workers with N>3 over 8 data files each 11GB.
I would like to do the same analysis using just 1 and 2 workers, but after few minutes the Proof Query Process sends the message "EVENTS SKIPPED and stops.

Any idea? Is there any limitation in the number of files or in the size of data when running with few workers?

Thanks for the help,
Ana Rodríguez.

The Log shows:

| Info in TMonitor::GetActive: socket: 0x368f1e0: worker01:22500 did not show any activity during the last 600000 millisecs: deactivating

| Info in TMonitor::GetActive: socket: 0x36951a0: worker02:22500 did not show any activity during the last 600000 millisecs: deactivating

I tried to avoid this possible Time Out with
gProof->Exec(“gEnv->GetValue(“Proof.SocketActivityTimeout”,-1)”, kTRUE); but it did not work. Is there any other variable I could modify?

Thanks,
Ana.

[quote=“arodrig”]
I did several successful analysis using N workers with N>3 over 8 data files each 11GB.
I would like to do the same analysis using just 1 and 2 workers, but after few minutes the Proof Query Process sends the message "EVENTS SKIPPED and stops.

Any idea? Is there any limitation in the number of files or in the size of data when running with few workers?[/quote]

I think, one (or more) of you workers just crashed. This is probably the reason you got skipped events.
The reason could be just in memory limits, because with less number of works each of them should process more events, therefore in some cases the memory requirements will be much higher for each worker. Also it could be a bug in your analysis script, which triggers SEGFAULT or Abort (unhandled exception) or something like that…

What PROOF logs from workers/server say?

This only indicates that your workers are dead. But you really want to know why they are dead and not just reduce this timeout, right?
So, check the logs of works.
Try to run the same analysis in PROOF Lite and check memory usage.

ok, thanks.

When running with Proof-Lite, this is the memory usage I got:

Worker01:
[…]
16:49:40 28544 Wrk-0.0 | SvcMsg in TProofPlayerSlave::Process: Memory 187216 virtual 158084 resident event 9133344
// --------- End of element log -------------------

Worker02:
[…]
16:49:29 28546 Wrk-0.1 | SvcMsg in TProofPlayerSlave::Process: Memory 181660 virtual 152444 resident event 9225600
// --------- End of element log -------------------

All the events were considered and the analysis finished properly.

Ana.

[quote=“arodrig”]When running with Proof-Lite, this is the memory usage I got:

Worker01:
[…]
16:49:40 28544 Wrk-0.0 | SvcMsg in TProofPlayerSlave::Process: Memory 187216 virtual 158084 resident event 9133344
// --------- End of element log -------------------

Worker02:
[…]
16:49:29 28546 Wrk-0.1 | SvcMsg in TProofPlayerSlave::Process: Memory 181660 virtual 152444 resident event 9225600
// --------- End of element log -------------------

All the events were considered and the analysis finished properly.

Ana.[/quote]
Please try now to run again on your 1 or 2 workers (like in the first message you wrote) and if/when you get skipped events look what worker’s log says?

It was indeed the memory usage. I managed to fix it.

Thanks,
ana.

[quote=“arodrig”]It was indeed the memory usage. I managed to fix it.

Thanks,
ana.[/quote]
:wink: