Centos8 root 6.24.06 crash - corrupted double-linked list Aborted (core dumped)

Hi, I’m trying to port my RDataFrame code from centos7 + root 6.24.00 to centos8.
I tried setting up root from LCG_101:

export LCGENV_PATH=/cvmfs/sft.cern.ch/lcg/releases
export PATH=/cvmfs/sft.cern.ch/lcg/releases/lcgenv/latest:${PATH}
eval "`lcgenv x86_64-centos8-gcc11-opt all`"

eval "`lcgenv -p LCG_101 x86_64-centos8-gcc11-opt CMake`"
eval "`lcgenv -p LCG_101 x86_64-centos8-gcc11-opt gdb`"

eval "`lcgenv -p LCG_101 x86_64-centos8-gcc11-opt ROOT`"

The code works, writes output, but crashes in the end with

corrupted double-linked list
Aborted (core dumped)

gdb outputs:

(gdb) where
#0  0x00007f578ffe8a4f in raise () from /lib64/libc.so.6
#1  0x00007f578ffbbdb5 in abort () from /lib64/libc.so.6
#2  0x00007f579002b057 in __libc_message () from /lib64/libc.so.6
#3  0x00007f57900321bc in malloc_printerr () from /lib64/libc.so.6
#4  0x00007f57900329fc in unlink_chunk.isra () from /lib64/libc.so.6
#5  0x00007f5790032b67 in malloc_consolidate () from /lib64/libc.so.6
#6  0x00007f5790033f90 in _int_free () from /lib64/libc.so.6
#7  0x00007f578a6b3db0 in clang::CodeGen::CodeGenModule::~CodeGenModule() ()
   from /cvmfs/sft.cern.ch/lcg/releases/LCG_101/ROOT/6.24.06/x86_64-centos8-gcc11-opt/lib/libCling.so
#8  0x00007f578a609dd0 in clang::CodeGeneratorImpl::~CodeGeneratorImpl() ()
   from /cvmfs/sft.cern.ch/lcg/releases/LCG_101/ROOT/6.24.06/x86_64-centos8-gcc11-opt/lib/libCling.so
#9  0x00007f578aa2b033 in clang::MultiplexConsumer::~MultiplexConsumer() ()
   from /cvmfs/sft.cern.ch/lcg/releases/LCG_101/ROOT/6.24.06/x86_64-centos8-gcc11-opt/lib/libCling.so
#10 0x00007f578aa2b089 in clang::MultiplexConsumer::~MultiplexConsumer() ()
   from /cvmfs/sft.cern.ch/lcg/releases/LCG_101/ROOT/6.24.06/x86_64-centos8-gcc11-opt/lib/libCling.so
#11 0x00007f578a57d830 in cling::DeclCollector::~DeclCollector() ()
   from /cvmfs/sft.cern.ch/lcg/releases/LCG_101/ROOT/6.24.06/x86_64-centos8-gcc11-opt/lib/libCling.so
#12 0x00007f578a57d8f9 in cling::DeclCollector::~DeclCollector() ()
   from /cvmfs/sft.cern.ch/lcg/releases/LCG_101/ROOT/6.24.06/x86_64-centos8-gcc11-opt/lib/libCling.so
#13 0x00007f578aa2b033 in clang::MultiplexConsumer::~MultiplexConsumer() ()
   from /cvmfs/sft.cern.ch/lcg/releases/LCG_101/ROOT/6.24.06/x86_64-centos8-gcc11-opt/lib/libCling.so
#14 0x00007f578aa2b089 in clang::MultiplexConsumer::~MultiplexConsumer() ()
   from /cvmfs/sft.cern.ch/lcg/releases/LCG_101/ROOT/6.24.06/x86_64-centos8-gcc11-opt/lib/libCling.so
#15 0x00007f578aa2b033 in clang::MultiplexConsumer::~MultiplexConsumer() ()
   from /cvmfs/sft.cern.ch/lcg/releases/LCG_101/ROOT/6.24.06/x86_64-centos8-gcc11-opt/lib/libCling.so
#16 0x00007f578aa2b089 in clang::MultiplexConsumer::~MultiplexConsumer() ()
   from /cvmfs/sft.cern.ch/lcg/releases/LCG_101/ROOT/6.24.06/x86_64-centos8-gcc11-opt/lib/libCling.so
#17 0x00007f578a4d1b45 in cling::Interpreter::ShutDown() ()
   from /cvmfs/sft.cern.ch/lcg/releases/LCG_101/ROOT/6.24.06/x86_64-centos8-gcc11-opt/lib/libCling.so
#18 0x00007f578a4d1c9e in cling::Interpreter::~Interpreter() ()
--Type <RET> for more, q to quit, c to continue without paging--
   from /cvmfs/sft.cern.ch/lcg/releases/LCG_101/ROOT/6.24.06/x86_64-centos8-gcc11-opt/lib/libCling.so
#19 0x00007f578a4d2289 in cling::Interpreter::~Interpreter() ()
   from /cvmfs/sft.cern.ch/lcg/releases/LCG_101/ROOT/6.24.06/x86_64-centos8-gcc11-opt/lib/libCling.so
#20 0x00007f578a40fa98 in TCling::~TCling() ()
   from /cvmfs/sft.cern.ch/lcg/releases/LCG_101/ROOT/6.24.06/x86_64-centos8-gcc11-opt/lib/libCling.so
#21 0x00007f578a40fd69 in TCling::~TCling() ()
   from /cvmfs/sft.cern.ch/lcg/releases/LCG_101/ROOT/6.24.06/x86_64-centos8-gcc11-opt/lib/libCling.so
#22 0x00007f579235c2b3 in TROOT::~TROOT() ()
   from /cvmfs/sft.cern.ch/lcg/releases/LCG_101/ROOT/6.24.06/x86_64-centos8-gcc11-opt/lib/libCore.so
#23 0x00007f578ffeb1ec in __run_exit_handlers () from /lib64/libc.so.6
#24 0x00007f578ffeb320 in exit () from /lib64/libc.so.6
#25 0x00007f578ffd4caa in __libc_start_main () from /lib64/libc.so.6
#26 0x000000000045f6be in _start ()

Do you have a suggestion which root setup should I try?
Thanks,
Zdenek

ROOT Version: 6.24/06
Platform: centos8
Compiler: gcc11


Hi Zdenek,
can you try with the LCG views instead of /lcg/releases?

E.g. source /cvmfs/sft.cern.ch/lcg/views/LCG_101/x86_64-centos8-gcc11-opt/setup.sh.

Cheers,
Enrico

Hi Enrico,
I tried, but I’m getting the same crash

#0  0x00007faf089eea4f in raise () from /lib64/libc.so.6
#1  0x00007faf089c1db5 in abort () from /lib64/libc.so.6
#2  0x00007faf08a31057 in __libc_message () from /lib64/libc.so.6
#3  0x00007faf08a381bc in malloc_printerr () from /lib64/libc.so.6
#4  0x00007faf08a389fc in unlink_chunk.isra () from /lib64/libc.so.6
#5  0x00007faf08a38b67 in malloc_consolidate () from /lib64/libc.so.6
#6  0x00007faf08a39f90 in _int_free () from /lib64/libc.so.6
#7  0x00007faf033af590 in clang::CodeGen::CodeGenModule::~CodeGenModule() ()
   from /cvmfs/sft.cern.ch/lcg/views/LCG_100/x86_64-centos8-gcc10-opt/lib/libCling.so

(I tried both LCG_100, gcc10 and LCG_101, gcc11)

I somehow need to figure out which part of the code could be triggering this, it happens at the last line of code after everything is saved
Cheers,
Zdenek

The crash actually happens at application teardown (in ~TROOT), when the ROOT session and the interpreter are being destroyed. So after all your code has already run.

That’s of course not intended, but I can’t reproduce the problem on lxplus8. It would be great if you could share a minimal, self-contained reproducer that we can run on lxplus8 or in a CentOS8 Docker container to debug the problem on our side.

Cheers,
Enrico

This will be tricky, because the code is rather complex. I still need to find out if it a problem with deleting histogram or something else.
I ran with address sanitizer:

==3153024==ERROR: AddressSanitizer: attempting double-free on 0x6030002ae020 in thread T0:
    #0 0x7ff32894e1b7 in operator delete(void*, unsigned long) /build/dkonst/BUILD2/build/contrib/gcc-11.1.0/src/gcc/11.1.0/libsanitizer/asan/asan_new_delete.cpp:172
    #1 0x7ff31c526123 in TClingClassInfo::TmpltName() const (/cvmfs/sft.cern.ch/lcg/releases/LCG_101/ROOT/6.24.06/x86_64-centos8-gcc11-opt/lib/libCling.so+0x63f123)
    #2 0x7ff328152029 in TBaseClass::IsSTLContainer() (/cvmfs/sft.cern.ch/lcg/releases/LCG_101/ROOT/6.24.06/x86_64-centos8-gcc11-opt/lib/libCore.so+0x28f029)
    #3 0x7ff32816380c in TClass::BuildRealData(void*, bool) [clone .localalias] (/cvmfs/sft.cern.ch/lcg/releases/LCG_101/ROOT/6.24.06/x86_64-centos8-gcc11-opt/lib/libCore.so+0x2a080c)
    #4 0x7ff327bc58dc in TBufferFile::WriteClassBuffer(TClass const*, void*) (/cvmfs/sft.cern.ch/lcg/releases/LCG_101/ROOT/6.24.06/x86_64-centos8-gcc11-opt/lib/libRIO.so+0xd48dc)
    #5 0x7ff327bc4f2a in TBufferFile::WriteObjectClass(void const*, TClass const*, bool) (/cvmfs/sft.cern.ch/lcg/releases/LCG_101/ROOT/6.24.06/x86_64-centos8-gcc11-opt/lib/libRIO.so+0xd3f2a)
    #6 0x7ff327bcc383 in TBufferIO::WriteObjectAny(void const*, TClass const*, bool) (/cvmfs/sft.cern.ch/lcg/releases/LCG_101/ROOT/6.24.06/x86_64-centos8-gcc11-opt/lib/libRIO.so+0xdb383)
    #7 0x7ff328120b8c in TObjArray::Streamer(TBuffer&) (/cvmfs/sft.cern.ch/lcg/releases/LCG_101/ROOT/6.24.06/x86_64-centos8-gcc11-opt/lib/libCore.so+0x25db8c)
    #8 0x7ff327bc4f2a in TBufferFile::WriteObjectClass(void const*, TClass const*, bool) (/cvmfs/sft.cern.ch/lcg/releases/LCG_101/ROOT/6.24.06/x86_64-centos8-gcc11-opt/lib/libRIO.so+0xd3f2a)
    #9 0x7ff327bcc26a in TBufferIO::WriteObjectAny(void const*, TClass const*, bool) (/cvmfs/sft.cern.ch/lcg/releases/LCG_101/ROOT/6.24.06/x86_64-centos8-gcc11-opt/lib/libRIO.so+0xdb26a)
    #10 0x7ff327c8c53f in TStreamerInfo::Streamer(TBuffer&) (/cvmfs/sft.cern.ch/lcg/releases/LCG_101/ROOT/6.24.06/x86_64-centos8-gcc11-opt/lib/libRIO.so+0x19b53f)
    #11 0x7ff327bc4f2a in TBufferFile::WriteObjectClass(void const*, TClass const*, bool) (/cvmfs/sft.cern.ch/lcg/releases/LCG_101/ROOT/6.24.06/x86_64-centos8-gcc11-opt/lib/libRIO.so+0xd3f2a)
    #12 0x7ff327bcc383 in TBufferIO::WriteObjectAny(void const*, TClass const*, bool) (/cvmfs/sft.cern.ch/lcg/releases/LCG_101/ROOT/6.24.06/x86_64-centos8-gcc11-opt/lib/libRIO.so+0xdb383)
    #13 0x7ff32811c343 in TList::Streamer(TBuffer&) (/cvmfs/sft.cern.ch/lcg/releases/LCG_101/ROOT/6.24.06/x86_64-centos8-gcc11-opt/lib/libCore.so+0x259343)
    #14 0x7ff327c6ae3b in TKey::TKey(TObject const*, char const*, int, TDirectory*) (/cvmfs/sft.cern.ch/lcg/releases/LCG_101/ROOT/6.24.06/x86_64-centos8-gcc11-opt/lib/libRIO.so+0x179e3b)
    #15 0x7ff327c39157 in TFile::WriteStreamerInfo() (/cvmfs/sft.cern.ch/lcg/releases/LCG_101/ROOT/6.24.06/x86_64-centos8-gcc11-opt/lib/libRIO.so+0x148157)
    #16 0x7ff327c37cab in TFile::Close(char const*) (/cvmfs/sft.cern.ch/lcg/releases/LCG_101/ROOT/6.24.06/x86_64-centos8-gcc11-opt/lib/libRIO.so+0x146cab)
    #17 0x7ff328061aac in (anonymous namespace)::R__ListSlowClose(TList*) (/cvmfs/sft.cern.ch/lcg/releases/LCG_101/ROOT/6.24.06/x86_64-centos8-gcc11-opt/lib/libCore.so+0x19eaac)
    #18 0x7ff32806226c in TROOT::CloseFiles() (/cvmfs/sft.cern.ch/lcg/releases/LCG_101/ROOT/6.24.06/x86_64-centos8-gcc11-opt/lib/libCore.so+0x19f26c)
    #19 0x7ff325cd81eb in __run_exit_handlers (/lib64/libc.so.6+0x511eb)
    #20 0x7ff325cd831f in exit (/lib64/libc.so.6+0x5131f)
    #21 0x7ff325cc1ca9 in __libc_start_main (/lib64/libc.so.6+0x3aca9)
    #22 0x49a7fd in _start (/scratch/zhubacek/RDataFrame2022/incljets_rdataframe/rdf_dijet_reader+0x49a7fd)

0x6030002ae020 is located 0 bytes inside of 17-byte region [0x6030002ae020,0x6030002ae031)
freed by thread T0 here:
    #0 0x7ff32894dd37 in operator delete(void*) /build/dkonst/BUILD2/build/contrib/gcc-11.1.0/src/gcc/11.1.0/libsanitizer/asan/asan_new_delete.cpp:160
    #1 0x7ff31ef055a2 in llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::BasicBlock, false> >::~SemiNCAInfo() (/cvmfs/sft.cern.ch/lcg/releases/LCG_101/ROOT/6.24.06/x86_64-centos8-gcc11-opt/lib/libCling.so+0x301e5a2)
    #2 0x6110002e718f  (<unknown module>)

previously allocated by thread T0 here:
    #0 0x7ff32894d337 in operator new(unsigned long) /build/dkonst/BUILD2/build/contrib/gcc-11.1.0/src/gcc/11.1.0/libsanitizer/asan/asan_new_delete.cpp:99
    #1 0x7ff31ea529ee in clang::DeclarationName::getAsString[abi:cxx11]() const (/cvmfs/sft.cern.ch/lcg/releases/LCG_101/ROOT/6.24.06/x86_64-centos8-gcc11-opt/lib/libCling.so+0x2b6b9ee)
    #2 0x62500244c8ff  (<unknown module>)

SUMMARY: AddressSanitizer: double-free /build/dkonst/BUILD2/build/contrib/gcc-11.1.0/src/gcc/11.1.0/libsanitizer/asan/asan_new_delete.cpp:172 in operator delete(void*, unsigned long)
==3153024==ABORTING

if that could help a bit?
Cheers,
Zdenek

Uhm so the crash happens when ROOT closes whatever TFiles are still present at the end of the program.

Can you run the under valgrind? valgrind --suppressions=$ROOTSYS/etc/valgrind-root.supp <yourprogram> (without address sanitizer compilation flags, but with debug flags, and ideally with an installation of ROOT with debug symbols).

Cheers,
Enrico

Found it. This was a bug on my side, actually this executable was missing outputFile->Close(). It was probably just luck that it didn’t crash on centos7 and all histograms seemed to be saved.

Cheers,
Zdenek

Great, congrats on hunting this down! I bet what was happening was that at the end of the application ROOT was closing the file that was left open, and TFile::Close was trying to write to the file all related objects before actually closing it, but at that point in the execution some of those objects were already out of scope. Or something like that. In the end you’d get a use after delete, which is undefined behavior, and as it is usually the case with undefined behavior in C++, it depends on the compiler, the code and your luck whether you actually get a crash or not.

Glad this is solved!
Enrico

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.