Hello,
I am using using a TSelector and PROOFLite (5.22). I seem to have finally solved most of my issues that have been fouling my program, save one that I encountered today.
I am using TProofOutputFile. When I run a test job (where the output tree size is ~1GB) everything runs smoothly. But when running over a full dataset (output size ~10GB) the program seems to run correctly until the files are to be merged (this does not occur, no output file). Looking at the log files I find the following errors:
worker-0.6.log -
SlaveTerminate
File pointer exists
Writing Tree
Closing fFile
fProofFile Print
leave SlaveTerminate
15:23:07 7261 Wrk-0.6 | *** Break ***: write on a pipe with no one to read it
15:23:07 7261 Wrk-0.6 | SysError in TUnixSystem::UnixSend: send (Broken pipe)
15:23:07 7261 Wrk-0.6 | SysError in TProofServLite::SendLogFile: error sending log file (Broken pipe)
15:23:07 7261 Wrk-0.6 | SysError in TUnixSystem::DispatchOneEvent: select: read error on 34
(Bad file descriptor)
from worker-0.5.log -
leave SlaveTerminate
15:23:34 7259 Wrk-0.5 | SysError in TProofServLite::SendLogFile: error sending log file (No such file
or directory)
15:23:34 7259 Wrk-0.5 | SysError in TUnixSystem::DispatchOneEvent: select: read error on 34
(Bad file descriptor)
and the rest of the log files (6 more) -
leave SlaveTerminate
15:23:07 7263 Wrk-0.7 | Error in TProofServLite::HandleSocketInput: retrieving message from input soc
ket
15:23:07 7263 Wrk-0.7 | Info in TProofServLite::Terminate: starting session termination operations …
.
Terminate: termination operations ended: quitting!
One problem is that I can’t issue these commands from the root interpreter as I am kicked out to the shell after the completion of the processing. That is, the program does process all the events, the client window shows no errors upon completion (and for a smaller subsample of events everything including the merging is fine). If I look at my session the final output I see somthing like:
…
Looking up for exact location of files: OK (336 files)
Looking up for exact location of files: OK (336 files)
Validating files: OK (336 files)
Output file: SimpleNtuple.root
Output file: SimpleNtuple.root
Output file: SimpleNtuple.root
Output file: SimpleNtuple.root
Output file: SimpleNtuple.root
Output file: SimpleNtuple.root
Output file: SimpleNtuple.root
[zcanada2] /canada/zcanada2a/stewartt/charm_eff $ <- simply kicked to the shell
where I am running with 8 (or 4 or 6 etc) nodes (but one of the files isn’t created). I can look in the .proof directory at the logs (hence the error messages in the original post). Looking in the tmp directory (redirected using the TMPDIR env variable to a 2TB drive… so plenty of space). I see the following:
-rw-r–r-- 1 stewartt zeus 396 May 21 00:55 ROOTMERGED-439654ca-4591-11de-8001-9c46a983beef.root
-rw-r–r-- 1 stewartt zeus 396 May 20 06:15 ROOTMERGED-e3516902-44f4-11de-8001-9c46a983beef.root
-rw-r–r-- 1 stewartt zeus 396 May 20 15:20 ROOTMERGED-fd500470-4540-11de-8001-9c46a983beef.root
-rw-r–r-- 1 stewartt zeus 0 May 20 01:04 proof-cache-lock-%canada%zcanada2a%stewartt%charm_eff%.proof%cache
-rw-r–r-- 1 stewartt zeus 0 May 20 00:58 proof-query-lock-zcanada2-1242773882-13695-%canada%zcanada2a%stewartt%charm_eff%.proof%canada-zcanada2a-stewartt-charm_eff%queries
-rw-r–r-- 1 stewartt zeus 0 May 20 03:46 proof-query-lock-zcanada2-1242783992-23002-%canada%zcanada2a%stewartt%charm_eff%.proof%canada-zcanada2a-stewartt-charm_eff%queries
-rw-r–r-- 1 stewartt zeus 0 May 20 13:25 proof-query-lock-zcanada2-1242818713-7192-%canada%zcanada2a%stewartt%charm_eff%.proof%canada-zcanada2a-stewartt-charm_eff%queries
-rw-r–r-- 1 stewartt zeus 0 May 20 13:25 proof-query-lock-zcanada2-1242818736-7239-%canada%zcanada2a%stewartt%charm_eff%.proof%canada-zcanada2a-stewartt-charm_eff%queries
-rw-r–r-- 1 stewartt zeus 0 May 20 22:58 proof-query-lock-zcanada2-1242853108-12135-%canada%zcanada2a%stewartt%charm_eff%.proof%canada-zcanada2a-stewartt-charm_eff%queries
srwxr-xr-x 1 stewartt zeus 0 May 20 00:58 prooflite-sockpath-zcanada2-1242773882-13695
srwxr-xr-x 1 stewartt zeus 0 May 20 03:46 prooflite-sockpath-zcanada2-1242783992-23002
srwxr-xr-x 1 stewartt zeus 0 May 20 13:25 prooflite-sockpath-zcanada2-1242818713-7192
srwxr-xr-x 1 stewartt zeus 0 May 20 13:25 prooflite-sockpath-zcanada2-1242818736-7239
srwxr-xr-x 1 stewartt zeus 0 May 20 22:58 prooflite-sockpath-zcanada2-1242853108-12135
this file being from my last attempt
May 21 00:55 ROOTMERGED-439654ca-4591-11de-8001-9c46a983beef.root
Just as a addendum for the worker-0.5 and 0.6 the sandbox’s for each node have a well constructed root file which would have been merged (I assume) if the merging routine had begun.