Advice on bus error

Hi

I am asking for some support here in the hope that someone might have encountered this problem before. I am not sure if it is specific to ROOT though.

I have some code stored on AFS at CERN (my ROOT build, 5.34, is also sourced via AFS) which I access at my institution using Kerberos authentication to see my private area on lxplus. The program uses ROOT libraries. I run the code on my institution’s computing cluster (the main bulk of the code is using VEGAS integration and uses ROOT for file management). I am not running in parallel on our cluster, just using each node as an individual machine.

On occasional events (which are being read from a TTree) I get a bus error, but there does not appear to be any consistency in the nodes on which this occurs nor what the actual error is in the stacktrace. I am wondering if it might be related to losing an AFS connection but I have no idea whether this might be the right line of enquiry.

An example error log is:

 *** Break *** bus error
Error in <TUnixSystem::StackTrace> script /afs/cern.ch/sw/lcg/app/releases/ROOT/5.34.07/x86_64-slc5-gcc43-dbg/root/etc/gdb-backtrace.sh is missing
/var/spool/pbs/mom_priv/jobs/1789650.SC: line 52: 10870 Bus error               
 *** Break *** bus error

===========================================================
There was a crash (kSigBus).
This is the entire stack trace of all threads:
===========================================================
#0  0x00000035a5e9a075 in waitpid () from /lib64/libc.so.6
#1  0x00000035a5e3c741 in do_system () from /lib64/libc.so.6
#2  0x00002b6ba8f2d059 in TUnixSystem::Exec (this=0x19a44280,
    shellcmd=0x19a863a0 "/afs/cern.ch/sw/lcg/app/releases/ROOT/5.34.07/x86_64-slc5-gcc43-dbg/root/etc/gdb-backtrace.sh 11519 1>&2")
    at /build/bellenot/SPI/x86_64-slc5-gcc43-dbg/root/core/unix/src/TUnixSystem.cxx:2088
#3  0x00002b6ba8f2c294 in TUnixSystem::StackTrace (this=0x19a44280)
    at /build/bellenot/SPI/x86_64-slc5-gcc43-dbg/root/core/unix/src/TUnixSystem.cxx:2336
#4  0x00002b6ba8f2f756 in TUnixSystem::DispatchSignals (this=0x19a44280,
    sig=kSigBus)
    at /build/bellenot/SPI/x86_64-slc5-gcc43-dbg/root/core/unix/src/TUnixSystem.cxx:1212
#5  0x00002b6ba8f2f880 in SigHandler (sig=kSigBus)
    at /build/bellenot/SPI/x86_64-slc5-gcc43-dbg/root/core/unix/src/TUnixSystem.cxx:368
#6  0x00002b6ba8f24708 in sighandler (sig=7)
    at /build/bellenot/SPI/x86_64-slc5-gcc43-dbg/root/core/unix/src/TUnixSystem.cxx:3650
#7  <signal handler called>
#8  std::set<int, std::less<int>, std::allocator<int> >::~set (
    this=0x7fff04b9d858, __in_chrg=<value optimized out>)
    at /afs/cern.ch/sw/lcg/contrib/gcc/4.3.6/x86_64-slc5-gcc46-opt/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.3.6/../../../../include/c++/4.3.6/bits/stl_set.h:93
#9  0x00002b6ba9985d9d in std::pair<char const* const, std::set<int, std::less<int>, std::allocator<int> > >::~pair (this=0x7fff04b9d850,
    __in_chrg=<value optimized out>)
    at /afs/cern.ch/sw/lcg/contrib/gcc/4.3.6/x86_64-slc5-gcc46-opt/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.3.6/../../../../include/c++/4.3.6/bits/stl_pair.h:73
#10 0x00002b6ba9986d14 in std::map<char const*, std::set<int, std::less<int>, std::allocator<int> >, NameMap::G__charptr_less, std::allocator<std::pair<char const* const, std::set<int, std::less<int>, std::allocator<int\
> > > > >::operator[] (this=0x19a49710, __k=
0x7fff04b9d900)
    at /afs/cern.ch/sw/lcg/contrib/gcc/4.3.6/x86_64-slc5-gcc46-opt/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.3.6/../../../../include/c++/4.3.6/bits/stl_map.h:419
#11 0x00002b6ba9987090 in NameMap::Insert (this=0x19a49710,
    name=0x19a86320 "size_t", idx=0) at cint/cint/src/common.h:1288
#12 0x00002b6ba99ae222 in G__search_typename (
    typenamein=0x2b6ba9a37845 "size_t", typein=107, tagnum=-1, reftype=0)
    at cint/cint/src/typedef.cxx:1440
#13 0x00002b6ba99ae594 in G__search_typename2 (
    type_name=0x2b6ba9a37845 "size_t", typein=107, tagnum=-1, reftype=0,
    parent_tagnum=-1) at cint/cint/src/typedef.cxx:1472
#14 0x00002b6ba98d8d76 in G__platformMacro () at cint/cint/src/init.cxx:2382
#15 0x00002b6ba98d8ffc in G__set_stdio () at cint/cint/src/init.cxx:2446
#16 0x00002b6ba98da549 in G__main (argc=2, argv=0x7fff04ba0310)
    at cint/cint/src/init.cxx:627
#17 0x00002b6ba98dcb44 in G__init_cint (command=0x2b6ba9352980 "cint +V")
    at cint/cint/src/init.cxx:363
#18 0x00002b6ba8ed53e2 in TCint::ResetAll (this=0x19a49bc0)
    at /build/bellenot/SPI/x86_64-slc5-gcc43-dbg/root/core/meta/src/TCint.cxx:717
#19 0x00002b6ba8ed6d69 in TCint::TCint (this=0x19a49bc0,
    name=0x2b6ba9349baf "C/C++",
    title=0x2b6ba9349b98 "CINT C/C++ Interpreter")
    at /build/bellenot/SPI/x86_64-slc5-gcc43-dbg/root/core/meta/src/TCint.cxx:315
#20 0x00002b6ba8e55af9 in TROOT::TROOT (this=0x2b6ba96fd4e0,
    name=0x2b6ba934a93c "root",
    title=0x2b6ba934abc1 "The ROOT of EVERYTHING", initfunc=0x0)
    at /build/bellenot/SPI/x86_64-slc5-gcc43-dbg/root/core/base/src/TROOT.cxx:310
#21 0x00002b6ba8e5762d in ROOT::GetROOT ()
    at /build/bellenot/SPI/x86_64-slc5-gcc43-dbg/root/core/base/src/TROOT.cxx:204
#22 0x00002b6ba8e57703 in __static_initialization_and_destruction_0 (
    __initialize_p=1, __priority=65535)
    at /build/bellenot/SPI/x86_64-slc5-gcc43-dbg/root/core/base/src/TROOT.cxx:213
#23 0x00002b6ba8e57749 in global constructors keyed to TROOT.cxx(void) ()
    at /build/bellenot/SPI/x86_64-slc5-gcc43-dbg/root/core/base/src/TROOT.cxx:2068
#24 0x00002b6ba9342fe6 in __do_global_ctors_aux ()
   from /afs/cern.ch/sw/lcg/app/releases/ROOT/5.34.07/x86_64-slc5-gcc43-dbg/root/lib/libCore.so
#25 0x00002b6ba8dbed53 in _init ()
   from /afs/cern.ch/sw/lcg/app/releases/ROOT/5.34.07/x86_64-slc5-gcc43-dbg/root/lib/libCore.so
#26 0x00002b6badedf4c8 in ?? ()
#27 0x00000035a5a0d4ab in call_init () from /lib64/ld-linux-x86-64.so.2
#28 0x00000035a5a0d5b5 in _dl_init_internal ()
   from /lib64/ld-linux-x86-64.so.2
#29 0x00000035a5a00aaa in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#30 0x0000000000000023 in ?? ()
===========================================================


The lines below might hint at the cause of the crash.
If they do not help you then please submit a bug report at
http://root.cern.ch/bugs. Please post the ENTIRE stack trace
from above as an attachment in addition to anything else
that might help us fixing this issue.
===========================================================
#8  std::set<int, std::less<int>, std::allocator<int> >::~set (
    this=0x7fff04b9d858, __in_chrg=<value optimized out>)
    at /afs/cern.ch/sw/lcg/contrib/gcc/4.3.6/x86_64-slc5-gcc46-opt/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.3.6/../../../../include/c++/4.3.6/bits/stl_set.h:93
#9  0x00002b6ba9985d9d in std::pair<char const* const, std::set<int, std::less<int>, std::allocator<int> > >::~pair (this=0x7fff04b9d850,
    __in_chrg=<value optimized out>)
    at /afs/cern.ch/sw/lcg/contrib/gcc/4.3.6/x86_64-slc5-gcc46-opt/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.3.6/../../../../include/c++/4.3.6/bits/stl_pair.h:73
#10 0x00002b6ba9986d14 in std::map<char const*, std::set<int, std::less<int>, std::allocator<int> >, NameMap::G__charptr_less, std::allocator<std::pair<char const* const, std::set<int, std::less<int>, std::allocator<int\
> > > > >::operator[] (this=0x19a49710, __k=
0x7fff04b9d900)
    at /afs/cern.ch/sw/lcg/contrib/gcc/4.3.6/x86_64-slc5-gcc46-opt/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.3.6/../../../../include/c++/4.3.6/bits/stl_map.h:419
#11 0x00002b6ba9987090 in NameMap::Insert (this=0x19a49710,
    name=0x19a86320 "size_t", idx=0) at cint/cint/src/common.h:1288
===========================================================

As I say, I think asking for help here is a long shot because from reading around the subject, its not clear what might cause a bus error other than something related to memory problems, but in the hope someone has some ideas, I’m posting here. It’s not obvious from the error message where in my own code this error is occurring and it does not happen with any form of consistency other than ~1% of my jobs seem to fail with a bus error.

Thanks,
Ian