Telling if a query failed!?

Hi,
I posted earlier about getting the logs from a failed query… turns out I have a more immediate problem - telling a query failed!! Note the log below. This happens when I call TProof::Process(dataset-names, object-of-tselector-name, 10000)… it returns zero. So, the info for this Process method suggests I call TSelector::GetStatus (root.cern.ch/root/html/TProof.ht … of:Process%2)… ok, but I have no pointer to a TSelector at the moment!! And GetStatus is not a static function. So, how do I find out what the heck happened? :slight_smile:

Cheers, Gordon.

P.S. Ignore the crash at the end - looks like PROOF leaves around something that has already been cleaned up or something… Not sure what is going on, but that may well be my problem…

[code]Number of files in dataset 1.
Starting master: opening connection …
Starting master: OK
Opening connections to workers: OK (4 workers)
Setting up worker servers: OK (4 workers)
PROOF set to parallel mode (4 workers)
tev11.phys.washington.edu: stat: cannot stat /phys/groups/tev/scratch4/users/proofbox/gwatts/session-tev11-1334813046-1160/master-0-tev11-1334813046-1160/ntuple_CollectionTree.h': No such file or directory [PutFile] Total 0.02 MB |====================| 100.00 % [0.6 MB/s] tev11.phys.washington.edu: stat: cannot stat/phys/groups/tev/scratch4/users/proofbox/gwatts/session-tev11-1334813046-1160/master-0-tev11-1334813046-1160/junk_macro_parsettree_CollectionTree.C’: No such file or director
y
[PutFile] Total 0.00 MB |====================| 100.00 % [0.0 MB/s]
Info in TWinNTSystem::ACLiC: creating shared library C:\Users\gwatts\AppData\Local\Temp\LINQToTTree\DumpingBasicInfo\d42lgq2u.1j5\query0_cxx.dll
2827249_cint.cxx
query0_cxx_ACLiC_dict.cxx
Creating library C:\Users\gwatts\AppData\Local\Temp\LINQToTTree\DumpingBasicInfo\d42lgq2u.1j5\query0_cxx.lib and object C:\Users\gwatts\AppData\Local\Temp\LINQToTTree\DumpingBasicInfo\d42lgq2u.1j5\query0_cxx.exp
22:24:10 1160 Mst-0 | Info in TXProofServ::HandleCache: loading macro query0.cxx+ …
22:24:10 1160 Mst-0 | Info in TUnixSystem::ACLiC: creating shared library /phys/groups/tev/scratch4/users/proofbox/gwatts/session-tev11-1334813046-1160/master-0-tev11-1334813046-1160/./query0_cxx.so
22:24:10 17435 Wrk-0.2 | Info in TXProofServ::HandleCache: loading macro query0.cxx+ …
22:24:10 17435 Wrk-0.2 | Info in TUnixSystem::ACLiC: creating shared library /phys/groups/tev/scratch4/users/proofbox/gwatts/session-tev11-1334813046-1160/worker-0.2-tev03-1334813047-17435/./query0_cxx.so
22:24:10 2288 Wrk-0.1 | Info in TXProofServ::HandleCache: loading macro query0.cxx+ …
22:24:10 2288 Wrk-0.1 | Info in TUnixSystem::ACLiC: creating shared library /phys/groups/tev/scratch4/users/proofbox/gwatts/session-tev11-1334813046-1160/worker-0.1-tev02-1334813047-2288/./query0_cxx.so
22:24:10 17296 Wrk-0.0 | Info in TXProofServ::HandleCache: loading macro query0.cxx+ …
22:24:10 17296 Wrk-0.0 | Info in TUnixSystem::ACLiC: creating shared library /phys/groups/tev/scratch4/users/proofbox/gwatts/session-tev11-1334813046-1160/worker-0.0-tev01-1334813047-17296/./query0_cxx.so
22:24:10 3396 Wrk-0.3 | Info in TXProofServ::HandleCache: loading macro query0.cxx+ …
22:24:10 3396 Wrk-0.3 | Info in TUnixSystem::ACLiC: creating shared library /phys/groups/tev/scratch4/users/proofbox/gwatts/session-tev11-1334813046-1160/worker-0.3-tev04-1334813047-3396/./query0_cxx.so
Looking up for exact location of files: OK (60 files)
Looking up for exact location of files: OK (60 files)
Validating files: OK (60 files)
0.0: caught exception triggered by signal ‘1’ while processing dset:’/CollectionTree’, file:‘file:///phys/groups/tev/scratch3/users/gwatts/mc/mc11_7TeV.105377.Pythia_HV_ggH_mH120_mVPI20.evgen.EVNT.e887_tid500482_00/EVNT.
500482._000060.pool.root.1’, event:4141 - check logs for possible stacktrace
Worker ‘tev01.phys.washington.edu-0.0’ has been removed from the active list

+++ Message from top master at tev11.phys.washington.edu:1093 : marking tev01.phys.washington.edu:2093 (0.0) as bad
+++ Reason: received kPROOF_FATAL

+++ Most likely your code crashed on worker 0.0 at tev01.phys.washington.edu:2093.
+++ Please check the session logs for error messages either using
+++ the ‘Show logs’ button or executing
+++
+++ root [] TProof::Mgr(“tev11.phys.washington.edu:1093”)->GetSessionLogs()->Display(“0.0”,0)

Worker ‘tev02.phys.washington.edu-0.1’ has been removed from the active list
0.1: caught exception triggered by signal ‘1’ while processing dset:’/CollectionTree’, file:‘file:///phys/groups/tev/scratch3/users/gwatts/mc/mc11_7TeV.105377.Pythia_HV_ggH_mH120_mVPI20.evgen.EVNT.e887_tid500482_00/EVNT.
500482._000060.pool.root.1’, event:3612 - check logs for possible stacktrace

+++ Message from top master at tev11.phys.washington.edu:1093 : marking tev02.phys.washington.edu:2093 (0.1) as bad
+++ Reason: received kPROOF_FATAL

+++ Most likely your code crashed on worker 0.1 at tev02.phys.washington.edu:2093.
+++ Please check the session logs for error messages either using
+++ the ‘Show logs’ button or executing
+++
+++ root [] TProof::Mgr(“tev11.phys.washington.edu:1093”)->GetSessionLogs()->Display(“0.1”,0)

0.2: caught exception triggered by signal ‘1’ while processing dset:’/CollectionTree’, file:‘file:///phys/groups/tev/scratch3/users/gwatts/mc/mc11_7TeV.105377.Pythia_HV_ggH_mH120_mVPI20.evgen.EVNT.e887_tid500482_00/EVNT.
500482._000060.pool.root.1’, event:4515 - check logs for possible stacktrace
Worker ‘tev03.phys.washington.edu-0.2’ has been removed from the active list

+++ Message from top master at tev11.phys.washington.edu:1093 : marking tev03.phys.washington.edu:2093 (0.2) as bad
+++ Reason: received kPROOF_FATAL

+++ Most likely your code crashed on worker 0.2 at tev03.phys.washington.edu:2093.
+++ Please check the session logs for error messages either using
+++ the ‘Show logs’ button or executing
+++
+++ root [] TProof::Mgr(“tev11.phys.washington.edu:1093”)->GetSessionLogs()->Display(“0.2”,0)

0.3: caught exception triggered by signal ‘1’ while processing dset:’/CollectionTree’, file:‘file:///phys/groups/tev/scratch3/users/gwatts/mc/mc11_7TeV.105377.Pythia_HV_ggH_mH120_mVPI20.evgen.EVNT.e887_tid500482_00/EVNT.
500482._000060.pool.root.1’, event:4999 - check logs for possible stacktrace
Worker ‘tev04.phys.washington.edu-0.3’ has been removed from the active list

+++ Message from top master at tev11.phys.washington.edu:1093 : marking tev04.phys.washington.edu:2093 (0.3) as bad
+++ Reason: received kPROOF_FATAL

+++ Most likely your code crashed on worker 0.3 at tev04.phys.washington.edu:2093.
+++ Please check the session logs for error messages either using
+++ the ‘Show logs’ button or executing
+++
+++ root [] TProof::Mgr(“tev11.phys.washington.edu:1093”)->GetSessionLogs()->Display(“0.3”,0)

Mst-0: merging output objects … done
st-0: objects merged; sending obj 19/19 (93057 bytes)
| session: gwatts.default.1160.status terminated by peernding)
Info in TXSlave::HandleError: 06A6CB40:tev11.phys.washington.edu:0 got called … fProof: 0767B828, fSocket: 06A6CC50 (valid: 1)
Info in TXSlave::HandleError: 06A6CB40: proof: 0767B828
TXSlave::HandleError: 06A6CB40: DONE …
Mst-0: grand total: sent 19 objects, size: 93057 bytes
Warning in TClass::TClass: no dictionary for class junk_macro_parsettree_CollectionTree_Interface is available

Unhandled Exception: System.IO.IOException: The directory is not empty.

at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
at System.IO.Directory.DeleteHelper(String fullPath, String userPath, Boolean recursive)
at System.IO.Directory.Delete(String fullPath, String userPath, Boolean recursive)
at System.IO.DirectoryInfo.Delete()
at LINQToTTreeLib.ExecutionCommon.ProofExecutor.Execute(FileInfo templateFile, DirectoryInfo queryDirectory, IEnumerable1 varsToTransfer) at LINQToTTreeLib.TTreeQueryExecutor.ExecuteQueuedQueries() at LINQToTTreeLib.FutureValue1.get_Value()
at LINQToTreeHelpers.FutureUtils.FutureTDirectory.FVHolder`1.get_AsTObject()
at LINQToTreeHelpers.FutureUtils.FutureTDirectory.Write()
at LINQToTreeHelpers.FutureUtils.FutureTDirectory.Write()
at LINQToTreeHelpers.FutureUtils.FutureTFile.Close()
at DumpingBasicInfo.Program.Main(String[] args) in C:\Users\gwatts\Documents\Visual Studio 2010\Projects\HVAssociatedTests\DumpingBasicInfo\Program.cs:line 50

==========================================
=============== STACKTRACE ===============

================ Thread 0 ================
clr!GetCLRFunction()
clr!GetCLRFunction()
mscorlib.ni!??
mscorlib.ni!??
mscorlib.ni!??
mscorlib.ni!??
0x73986fa ??
0x7398467 ??
0x7398439 ??
0x73981c5 ??
0x7398272 ??
clr!??
clr!LogHelp_TerminateOnAssert()
clr!LogHelp_TerminateOnAssert()
clr!LogHelp_TerminateOnAssert()
clr!LogHelp_TerminateOnAssert()
clr!SetRuntimeInfo()
clr!SetRuntimeInfo()
clr!SetRuntimeInfo()
clr!SetRuntimeInfo()
clr!SetRuntimeInfo()
clr!CorExeMain()
mscoreei!CorExeMain()
MSCOREE!CreateConfigStream()
MSCOREE!CorExeMain()
ntdll!RtlInitializeExceptionChain()
ntdll!RtlInitializeExceptionChain()

================ Thread 1 ================
ntdll!ZwWaitForMultipleObjects()
KERNEL32!WaitForMultipleObjectsEx()
KERNEL32!WaitForMultipleObjects()
clr!CreateApplicationContext()
clr!CreateApplicationContext()
clr!CreateApplicationContext()
KERNEL32!BaseThreadInitThunk()
ntdll!RtlInitializeExceptionChain()
ntdll!RtlInitializeExceptionChain()

================ Thread 2 ================
ntdll!ZwWaitForSingleObject()
KERNEL32!WaitForSingleObjectEx()
clr!LogHelp_TerminateOnAssert()
clr!LogHelp_TerminateOnAssert()
clr!LogHelp_TerminateOnAssert()
clr!LogHelp_TerminateOnAssert()
clr!GetPrivateContextsPerfCounters()
clr!GetPrivateContextsPerfCounters()
clr!GetPrivateContextsPerfCounters()
clr!GetPrivateContextsPerfCounters()
clr!GetPrivateContextsPerfCounters()
clr!GetPrivateContextsPerfCounters()
clr!GetPrivateContextsPerfCounters()
clr!SetRuntimeInfo()
clr!GetCLRFunction()
KERNEL32!BaseThreadInitThunk()
ntdll!RtlInitializeExceptionChain()
ntdll!RtlInitializeExceptionChain()

================ Thread 3 ================
ntdll!ZwWaitForMultipleObjects()
KERNEL32!BaseThreadInitThunk()
ntdll!RtlInitializeExceptionChain()
ntdll!RtlInitializeExceptionChain()

================ Thread 4 ================
ntdll!ZwDelayExecution()
KERNELBASE!Sleep()
libCore!TWinNTSystem::TimerThread()
KERNEL32!BaseThreadInitThunk()
ntdll!RtlInitializeExceptionChain()
ntdll!RtlInitializeExceptionChain()

================ Thread 5 ================
USER32!DispatchMessageW()
libCore!`anonymous namespace’::GetProgramCounter()
KERNEL32!BaseThreadInitThunk()
ntdll!RtlInitializeExceptionChain()
ntdll!RtlInitializeExceptionChain()

================ Thread 6 ================
ntdll!ZwDelayExecution()
KERNELBASE!Sleep()
libProofx!XrdOucHash::Expand()
msvcrt!itow_s()
msvcrt!endthreadex()
KERNEL32!BaseThreadInitThunk()
ntdll!RtlInitializeExceptionChain()
ntdll!RtlInitializeExceptionChain()

================ Thread 7 ================
ntdll!ZwWaitForSingleObject()
mswsock!??
WS2_32!select()
libProofx!XrdOucHash_Item::XrdOucHash_Item()
ntdll!RtlImageNtHeader()
ntdll!RtlFreeHeap()
KERNEL32!HeapFree()
MSVCR100!free()
libCore!TString::~TString()
libCore!TEnv::Getvalue()
libCore!TEnv::GetValue()
libCore!TEnv::GetValue()
KERNEL32!HeapFree()
pthreadVSE2!pthread_setcanceltype()
MSVCR100!_pctype_func()
MSVCR100!strncmp()
MSVCR100!fileno()
MSVCR100!unlock()
MSVCR100!unlock_file()
MSVCR100!strnlen()
libCore!Printf()
libProofx!TXSlave::HandleError()
MSVCR100!write()
MSVCR100!fileno()
pthreadVSE2!pthread_setcanceltype()
pthreadVSE2!pthread_setcanceltype()
libThread!TWin32Condition::Signal()
libThread!TSemaphore::Post()
libProofx!TXSocket::ProcessUnsolicitedMsg()
libProofx!TXSocket::ProcessUnsolicitedMsg()

================ Thread 8 ================
ntdll!ZwWaitForWorkViaWorkerFactory()
KERNEL32!BaseThreadInitThunk()
ntdll!RtlInitializeExceptionChain()
ntdll!RtlInitializeExceptionChain()

==========================================
============= END STACKTRACE =============

Press any key to continue . . .

[/code]

So, given I can’t get at the status, what I’ve taken to doing is getting the logs for a query (GetLastLogs) and then scanning it for the string kPROOF_FATAL - if that is in there, then I know that things went bad… But there is perhaps a better way?

Hi,

Sorry for the late reply.
The TStatus object ‘PROOF_Status’ in the output list contains the exits status of the processing (0 = finished, 1 = stopped, 2 = aborted).
You should also check the ‘MissingFiles’ list for emptiness …

void TProof::ShowMissingFiles()

and

TFileCollection *TProof::GetMissingFiles()

Gerri

Thanks! I’ll implement these!