Proof crashed at processing the event 10274 out fo 10421

Hi,

I ran a root 5.22.00 client to start a PROOF-Lite with 24 workers. At the beginning, everything is fine, but the event 10274 out of total event 10421 is stuck for a long time, and then I got the following error information. Can you help me to know what causes the problem?

In my proof.C, I set as follows:

TProof *gProof=TProof::Open(" ");
gSystem->SetAclicMode(TSystem::kDebug);
gSystem->SetAclicMode(TSystem::kOpt);
gProof->SetParameter(“PROOF_MaxSlavesPerNode”,24);

thanks,

Haiying Xu
/bin/cp: cannot stat `/home/ba01/u103/xu2/.proof/cache//scratch/lustreA/x/xu2/CMSSW_3_8_2/src/Selectors/ExampleSelector/./rootlogon.*’: No such file or directory
/scratch/lustreA/x/xu2/CMSSW_3_8_2/lib/slc5_ia32_gcc434
/scratch/lustreA/x/xu2/CMSSW_3_8_2/src
/scratch/lustreA/x/xu2/CMSSW_3_8_2/tmp
/apps/osg/cmssoft/cms/slc5_ia32_gcc434/cms/cmssw/CMSSW_3_8_2/src

Info in TProofLite::SetQueryRunning: starting query: 1
Looking up for exact location of files: OK (2 files)
Looking up for exact location of files: OK (2 files)
Validating files: OK (2 files)
Info in TMonitor::GetActive: socket: 0x9b5ef58: UnknownHost:-1 did not show any activity during the last 600000 millisecs: deactivating
Info in TMonitor::GetActive: socket: 0x9b5cbe8: UnknownHost:-1 did not show any activity during the last 600000 millisecs: deactivating
Info in TMonitor::GetActive: socket: 0x9b60268: UnknownHost:-1 did not show any activity during the last 600000 millisecs: deactivating
Info in TMonitor::GetActive: socket: 0x9b5ff28: UnknownHost:-1 did not show any activity during the last 600000 millisecs: deactivating
Info in TMonitor::GetActive: socket: 0x9b5cff0: UnknownHost:-1 did not show any activity during the last 600000 millisecs: deactivating
Info in TMonitor::GetActive: socket: 0x9b5f338: UnknownHost:-1 did not show any activity during the last 600000 millisecs: deactivating
Info in TMonitor::GetActive: socket: 0x9b616e8: UnknownHost:-1 did not show any activity during the last 600000 millisecs: deactivating
terminate
TCanvas::MakeDefCanvas: created default TCanvas with name c1

Dear Haiying,

Did you check the logs on the workers?
Also, did you try with a smaller number of workers?
The packetizer for <= 5.26 is not working well when the number of files is less than the number of workers.

G. Ganis

Thank you for your advice. Do you know which smaller number I shall use? Now I have a 24 core machine, so if I run proof-lite, it will run on 24 cores. Do you know if I can run less cores with proof-lite?

Thanks,

Haiying

Now I set the number of worker =6 with my proof farm instead of using proof-lite. But I got different errors as follows:
Do you know what caused the unknow action code: 5115 received from ‘client’ -disabling?
09:38:28 26726 Mst-0 | Info in TXProofServ::SetQueryRunning: starting query: 1
09:38:28 26726 Mst-0 | Error in TXSocket::ProcessUnsolicitedMsg: 0xa067be0: unknown action code: 5115 received from ‘client’ - disabling
09:38:28 26726 Mst-0 | Info in TPacketizerAdaptive::TPacketizerAdaptive: Setting max number of workers per node to 6
09:38:33 26726 Mst-0 | Info in TXProofServ::HandleUrgentData: problems touching path: /tmp/.xproofd.1093/activesessions/xu2.default.26726.status.26726
09:38:45 26726 Mst-0 | Info in TPacketizerAdaptive::TPacketizerAdaptive: fraction of remote files 1.000000
09:38:46 26726 Mst-0 | Error in TXSocket::ProcessUnsolicitedMsg: 0xa067be0: unknown action code: 5115 received from ‘client’ - disabling
09:38:46 26726 Mst-0 | Error in TXSocket::ProcessUnsolicitedMsg: 0xa067be0: unknown action code: 5115 received from ‘client’ - disabling
09:38:46 26726 Mst-0 | Error in TXSocket::ProcessUnsolicitedMsg: 0xa067be0: unknown action code: 5115 received from ‘client’ - disabling
09:38:46 26726 Mst-0 | Error in TXSocket::ProcessUnsolicitedMsg: 0xa067be0: unknown action code: 5115 received from ‘client’ - disabling
09:38:46 26726 Mst-0 | Error in TXSocket::ProcessUnsolicitedMsg: 0xa067be0: unknown action code: 5115 received from ‘client’ - disabling
09:38:46 26726 Mst-0 | Error in TXSocket::ProcessUnsolicitedMsg: 0xa067be0: unknown action code: 5115 received from ‘client’ - disabling
09:39:01 26726 Mst-0 | Info in TXProofServ::HandleUrgentData: problems touching path: /tmp/.xproofd.1093/activesessions/xu2.default.26726.status.26726
09:39:31 26726 Mst-0 | Info in TXProofServ::HandleUrgentData: problems touching path: /tmp/.xproofd.1093/activesessions/xu2.default.26726.status.26726
09:40:01 26726 Mst-0 | Info in TXProofServ::HandleUrgentData: problems touching path: /tmp/.xproofd.1093/activesessions/xu2.default.26726.status.26726
09:40:31 26726 Mst-0 | Info in TXProofServ::HandleUrgentData: problems touching path: /tmp/.xproofd.1093/activesessions/xu2.default.26726.status.26726
09:41:01 26726 Mst-0 | Info in TXProofServ::HandleUrgentData: problems touching path: /tmp/.xproofd.1093/activesessions/xu2.default.26726.status.26726

thanks,

Haiying