Hi Rooters (PROOFers?),
Here at SLAC we have a small PROOF cluster (8 machines, 98 cores), and in limit of high numbers of histograms I have been dealing with semi-random worker crashes as well as crashes during a very slow merging process. On the theory that maybe the slow merging was part of the issue, Shuwei from BNL suggested trying sub-merging enabled. Sub-merging seems to work fine with histograms/objects in the top-level directory of the ROOT file, but as soon as I add a TDirectoryFile object to the output file (I am using TProofOutputFile for output) the merging with sub-merging on seg-faults with this type of error:
===========================================================
#5 0x00002ab15eb6fa43 in TDirectoryFile::Get(char const*) ()
from /afs/slac.stanford.edu/g/atlas/packages/root/root_v5.28.00a.Linux-slc5_amd64-gcc4.3/lib/libRIO.so
#6 0x00002ab15eb6acf1 in TDirectoryFile::GetDirectory(char const*, bool, char const*) ()
from /afs/slac.stanford.edu/g/atlas/packages/root/root_v5.28.00a.Linux-slc5_amd64-gcc4.3/lib/libRIO.so
#7 0x00002ab16053d8e4 in TFileMerger::MergeRecursive(TDirectory*, TList*) ()
from /afs/slac.stanford.edu/g/atlas/packages/root/root_v5.28.00a.Linux-slc5_amd64-gcc4.3/lib/libProofPlayer.so
#8 0x00002ab16053e27f in TFileMerger::MergeRecursive(TDirectory*, TList*) ()
from /afs/slac.stanford.edu/g/atlas/packages/root/root_v5.28.00a.Linux-slc5_amd64-gcc4.3/lib/libProofPlayer.so
#9 0x00002ab16053d042 in TFileMerger::Merge(bool) ()
from /afs/slac.stanford.edu/g/atlas/packages/root/root_v5.28.00a.Linux-slc5_amd64-gcc4.3/lib/libProofPlayer.so
#10 0x00002ab160569fd8 in TProofPlayerRemote::MergeOutputFiles() ()
from /afs/slac.stanford.edu/g/atlas/packages/root/root_v5.28.00a.Linux-slc5_amd64-gcc4.3/lib/libProofPlayer.so
#11 0x00002ab16056f287 in TProofPlayerLite::Finalize(bool, bool) ()
from /afs/slac.stanford.edu/g/atlas/packages/root/root_v5.28.00a.Linux-slc5_amd64-gcc4.3/lib/libProofPlayer.so
#12 0x00002ab160570195 in TProofPlayerLite::Process(TDSet*, char const*, char const*, long long, long long) ()
from /afs/slac.stanford.edu/g/atlas/packages/root/root_v5.28.00a.Linux-slc5_amd64-gcc4.3/lib/libProofPlayer.so
#13 0x00002ab15f6dc437 in TProofLite::Process(TDSet*, char const*, char const*, long long, long long) ()
from /afs/slac.stanford.edu/g/atlas/packages/root/root_v5.28.00a.Linux-slc5_amd64-gcc4.3/lib/libProof.so
Am I doing something wrong, or is this a bug? We are using ROOT 5.28a.
Thanks,
Bart