PROOF instabilities with EOS

Hello,

I have having major instabilities with PROOF/EOS. I have a ticket open on this for EOS but the developer there believes this is a PROOF problem.
cern.service-now.com/service-po … =INC183104

The problem is the following, PROOF will starting the merging of histograms. This completes fine but when the next set of files is accessed, sframe crashes when trying to validate the files (located on EOS). The core dump is here.

Cheers,
Monica

AssertDataSet on Mst-0: no dataset(s) found on the master corresponding to: TChain:physics
( ERROR ) TUnixSystem::Di… : segmentation violation

===========================================================
There was a crash.
This is the entire stack trace of all threads:

Thread 5 (Thread 0x42524940 (LWP 18512)):
#0 0x00002b069ad73be1 in nanosleep () from /lib64/libc.so.6
#1 0x00002b069ad73a04 in sleep () from /lib64/libc.so.6
#2 0x00002b06a2d14824 in GarbageCollectorThread (arg=0xbfa9c80,
thr=)
at /build/hegner/LCGCMT/work/xrootd-3.2.2/src/XrdClient/XrdClientConnMgr.cc:73
#3 0x00002b06a2a921ff in XrdSysThread_Xeq (myargs=)
at /build/hegner/LCGCMT/work/xrootd-3.2.2/src/XrdSys/XrdSysPthread.cc:67
#4 0x00002b069aac477d in start_thread () from /lib64/libpthread.so.0
#5 0x00002b069adadc1d in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x42f25940 (LWP 18513)):
#0 0x00002b069ada4d26 in poll () from /lib64/libc.so.6
#1 0x00002b06a2cfbb56 in XrdClientSock::RecvRaw (this=0xbfae570,
buffer=0xbf47c10, length=8, substreamid=-1, usedsubstreamid=0x0)
at /build/hegner/LCGCMT/work/xrootd-3.2.2/src/XrdClient/XrdClientSock.cc:133
#2 0x00002b06a2d1de90 in XrdClientPhyConnection::ReadRaw (this=0xbfacca0,
buf=0xbf47c10, len=8, substreamid=-1, usedsubstreamid=0x42f24d28)
at /build/hegner/LCGCMT/work/xrootd-3.2.2/src/XrdClient/XrdClientPhyConnection.cc:359
#3 0x00002b06a2d2128c in XrdClientMessage::ReadRaw (this=0xbf47bd0,
phy=0xbfacca0)
at /build/hegner/LCGCMT/work/xrootd-3.2.2/src/XrdClient/XrdClientMessage.cc:152
#4 0x00002b06a2d1c7aa in XrdClientPhyConnection::BuildMessage (
this=0xbfacca0, IgnoreTimeouts=true, Enqueue=true)
at /build/hegner/LCGCMT/work/xrootd-3.2.2/src/XrdClient/XrdClientPhyConnection.cc:440
#5 0x00002b06a2d2089a in SocketReaderThread (arg=0xbfacca0,
thr=)
at /build/hegner/LCGCMT/work/xrootd-3.2.2/src/XrdClient/XrdClientPhyConnection.cc:57
#6 0x00002b06a2a921ff in XrdSysThread_Xeq (myargs=)
at /build/hegner/LCGCMT/work/xrootd-3.2.2/src/XrdSys/XrdSysPthread.cc:67
#7 0x00002b069aac477d in start_thread () from /lib64/libpthread.so.0
#8 0x00002b069adadc1d in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x43926940 (LWP 17464)):
#0 0x00002b069ad73be1 in nanosleep () from /lib64/libc.so.6
#1 0x00002b069ad73a04 in sleep () from /lib64/libc.so.6
#2 0x00002b06a2d14824 in GarbageCollectorThread (arg=0x2aaabc1eb5c0,
thr=)
at /build/hegner/LCGCMT/work/xrootd-3.2.2/src/XrdClient/XrdClientConnMgr.cc:73
#3 0x00002b06a2a921ff in XrdSysThread_Xeq (myargs=)
at /build/hegner/LCGCMT/work/xrootd-3.2.2/src/XrdSys/XrdSysPthread.cc:67
#4 0x00002b069aac477d in start_thread () from /lib64/libpthread.so.0
#5 0x00002b069adadc1d in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x41023940 (LWP 17465)):
#0 0x00002b069ada4d26 in poll () from /lib64/libc.so.6
#1 0x00002b06a2cfbb56 in XrdClientSock::RecvRaw (this=0x2aaab71ddf20,
buffer=0xdb3c090, length=8, substreamid=-1, usedsubstreamid=0x0)
at /build/hegner/LCGCMT/work/xrootd-3.2.2/src/XrdClient/XrdClientSock.cc:133
#2 0x00002b06a2d1de90 in XrdClientPhyConnection::ReadRaw (
this=0x2aaaac02f1a0, buf=0xdb3c090, len=8, substreamid=-1,
usedsubstreamid=0x41022d28)
at /build/hegner/LCGCMT/work/xrootd-3.2.2/src/XrdClient/XrdClientPhyConnection.cc:359
#3 0x00002b06a2d2128c in XrdClientMessage::ReadRaw (this=0xdb3c050,
phy=0x2aaaac02f1a0)
at /build/hegner/LCGCMT/work/xrootd-3.2.2/src/XrdClient/XrdClientMessage.cc:152
#4 0x00002b06a2d1c7aa in XrdClientPhyConnection::BuildMessage (
this=0x2aaaac02f1a0, IgnoreTimeouts=true, Enqueue=true)
at /build/hegner/LCGCMT/work/xrootd-3.2.2/src/XrdClient/XrdClientPhyConnection.cc:440
#5 0x00002b06a2d2089a in SocketReaderThread (arg=0x2aaaac02f1a0,
thr=)
at /build/hegner/LCGCMT/work/xrootd-3.2.2/src/XrdClient/XrdClientPhyConnection.cc:57
#6 0x00002b06a2a921ff in XrdSysThread_Xeq (myargs=)
at /build/hegner/LCGCMT/work/xrootd-3.2.2/src/XrdSys/XrdSysPthread.cc:67
#7 0x00002b069aac477d in start_thread () from /lib64/libpthread.so.0
#8 0x00002b069adadc1d in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x2b069d4caf40 (LWP 18435)):
#0 0x00002b069ad737ef in waitpid () from /lib64/libc.so.6
#1 0x00002b069ad16761 in do_system () from /lib64/libc.so.6
#2 0x00002b069ad16ab7 in system () from /lib64/libc.so.6
#3 0x00002b0694c547d6 in TUnixSystem::StackTrace() ()
from /afs/cern.ch/sw/lcg/app/releases/ROOT/5.34.01/x86_64-slc5-gcc43-opt/root/lib/libCore.so
#4 0x00002b0694c540ac in TUnixSystem::DispatchSignals(ESignals) ()
from /afs/cern.ch/sw/lcg/app/releases/ROOT/5.34.01/x86_64-slc5-gcc43-opt/root/lib/libCore.so
#5
#6 0x00002b0694b91acc in TObject::SetBit(unsigned int, bool) ()
from /afs/cern.ch/sw/lcg/app/releases/ROOT/5.34.01/x86_64-slc5-gcc43-opt/root/lib/libCore.so
#7 0x00002b0694b9439c in TObjectRefSpy::~TObjectRefSpy() ()
from /afs/cern.ch/sw/lcg/app/releases/ROOT/5.34.01/x86_64-slc5-gcc43-opt/root/lib/libCore.so
#8 0x00002b0699e9e3fa in TProofPlayerRemote::Finalize(bool, bool) ()
from /afs/cern.ch/sw/lcg/app/releases/ROOT/5.34.01/x86_64-slc5-gcc43-opt/root/lib/libProofPlayer.so
#9 0x00002b0699e94de8 in TProofPlayerRemote::Process(TDSet*, char const*, char const*, long long, long long) ()
from /afs/cern.ch/sw/lcg/app/releases/ROOT/5.34.01/x86_64-slc5-gcc43-opt/root/lib/libProofPlayer.so
#10 0x00002b0699ab29f1 in TProof::Process(TDSet*, char const*, char const*, long long, long long) ()
from /afs/cern.ch/sw/lcg/app/releases/ROOT/5.34.01/x86_64-slc5-gcc43-opt/root/lib/libProof.so
#11 0x00002b069475436c in SCycleController::ExecuteNextCycle() ()
from /afs/cern.ch/user/m/mdunford/work/ElectroweakBosons/SFrame/lib/libSFrameCore.so
#12 0x00002b06947500ca in SCycleController::ExecuteAllCycles() ()
from /afs/cern.ch/user/m/mdunford/work/ElectroweakBosons/SFrame/lib/libSFrameCore.so
#13 0x000000000040189c in main ()

Dear Monica and the others following/suffering-from this issue,

We are investigating a problem writing files via xrootd which may be related to this.
The problem shows up when the output is merged via file, therefore using internally the class TFileMerger.
I hope we can get rid of that soon.

Can I ask what you mean by

?
Are these input files of a next run or output files produced by PROOF itself?

Gerri