Occasional(!) crash in cling::CIFactory::createCI()

We have a C++ application that creates some ROOT histograms and most of the times it works flawlessly. But once in a while it crashes with segmentation fault when the very first histogram object is created. In such a case we get the following message appearing in the standard error output:

ERROR in cling::CIFactory::createCI(): cannot extract standard library include paths!
Invoking:
  LC_ALL=C g++   -xc++ -E -v /dev/null 2>&1 | sed -n -e '/^.include/,${' -e '/^ \/.*++/p' -e '}'
Results was:
With exit code 0
error: entry with relative path at the root level is not discoverable
{ 'name': '', 'type': 'directory',
          ^~
Error in modulemap.overlay!

Does anyone have an idea what is the exact reason of such a crash? What surprises us the most is that in all cases we run the same binary on the same computer and it fails very rarely, like once per week only.

Thanks in advance,
Cheers


ROOT Version: 6.24.06
Platform: CentOS 7
Compiler: gcc 11.1.0


That’s weird, maybe @Axel has an idea what could be a possible cause

I have a few more details about this crash which may give a hint to an expert. When the crash happens we always observe the following pattern:

  • when TH1C::GetClass() function is called for the first time it looks like the ROOT initialisation routine executes fork() function 6 times
  • each forked child process crashed with a segmentation fault and a separate core file created for each of them
  • at this point we get 6 core files with the stack traces that look very similar, e.g.:
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fa9ed78f67b in __malloc_fork_unlock_child () from /lib64/libc.so.6
(gdb) bt
#0  0x00007fa9ed78f67b in __malloc_fork_unlock_child () from /lib64/libc.so.6
#1  0x00007fa9ed7cfb93 in fork () from /lib64/libc.so.6
#2  0x00007fa9ed77a3dc in _IO_proc_open@@GLIBC_2.2.5 () from /lib64/libc.so.6
#3  0x00007fa9ed77a66c in popen@@GLIBC_2.2.5 () from /lib64/libc.so.6
#4  0x00007fa9eedbbdb8 in DynamicPath(char const*, bool) () from /sw/atlas/sw/lcg/releases/LCG_101/ROOT/6.24.06/x86_64-centos7-gcc11-opt/lib/libCore.so
#5  0x00007fa9eedbc723 in TUnixSystem::FindDynamicLibrary(TString&, bool) ()
   from /sw/atlas/sw/lcg/releases/LCG_101/ROOT/6.24.06/x86_64-centos7-gcc11-opt/lib/libCore.so
#6  0x00007fa9eecd2e0c in TSystem::DynamicPathName(char const*, bool) ()
   from /sw/atlas/sw/lcg/releases/LCG_101/ROOT/6.24.06/x86_64-centos7-gcc11-opt/lib/libCore.so
#7  0x00007fa9eec62680 in TROOT::InitInterpreter() () from /sw/atlas/sw/lcg/releases/LCG_101/ROOT/6.24.06/x86_64-centos7-gcc11-opt/lib/libCore.so
#8  0x00007fa9eec6285f in ROOT::Internal::GetROOT2() () from /sw/atlas/sw/lcg/releases/LCG_101/ROOT/6.24.06/x86_64-centos7-gcc11-opt/lib/libCore.so
#9  0x00007fa9eed78009 in ROOT::TGenericClassInfo::GetClass() () from /sw/atlas/sw/lcg/releases/LCG_101/ROOT/6.24.06/x86_64-centos7-gcc11-opt/lib/libCore.so
#10 0x00007fa9ee760a76 in TH1C::Class() () from /sw/atlas/sw/lcg/releases/LCG_101/ROOT/6.24.06/x86_64-centos7-gcc11-opt/lib/libHist.so
#11 0x00007fa9f04e911a in OHRootProvider::convert (histogram=..., ann=...)

and after that we finally get our application crashed with the following stack trace:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fa76726ecc6 in llvm::vfs::OverlayFileSystem::pushOverlay(llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>) ()
   from /sw/atlas/sw/lcg/releases/LCG_101/ROOT/6.24.06/x86_64-centos7-gcc11-opt/lib/libCling.so
[Current thread is 1 (Thread 0x7fa76cb8c700 (LWP 269700))]
(gdb) bt
#0  0x00007fa76726ecc6 in llvm::vfs::OverlayFileSystem::pushOverlay(llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>) ()
   from /sw/atlas/sw/lcg/releases/LCG_101/ROOT/6.24.06/x86_64-centos7-gcc11-opt/lib/libCling.so
#1  0x00007fa76472d896 in (anonymous namespace)::collectModuleMaps(clang::CompilerInstance&, llvm::SmallVectorImpl<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >&) [clone .constprop.0] () from /sw/atlas/sw/lcg/releases/LCG_101/ROOT/6.24.06/x86_64-centos7-gcc11-opt/lib/libCling.so
#2  0x00007fa76472dff5 in (anonymous namespace)::setupCxxModules(clang::CompilerInstance&) ()
   from /sw/atlas/sw/lcg/releases/LCG_101/ROOT/6.24.06/x86_64-centos7-gcc11-opt/lib/libCling.so
#3  0x00007fa764731a3f in (anonymous namespace)::createCIImpl(std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer> >, cling::CompilerOptions const&, char const*, std::unique_ptr<clang::ASTConsumer, std::default_delete<clang::ASTConsumer> >, std::vector<std::shared_ptr<clang::ModuleFileExtension>, std::allocator<std::shared_ptr<clang::ModuleFileExtension> > > const&, bool, bool) ()
   from /sw/atlas/sw/lcg/releases/LCG_101/ROOT/6.24.06/x86_64-centos7-gcc11-opt/lib/libCling.so
#4  0x00007fa7647334b7 in cling::CIFactory::createCI(llvm::StringRef, cling::InvocationOptions const&, char const*, std::unique_ptr<clang::ASTConsumer, std::default_delete<clang::ASTConsumer> >, std::vector<std::shared_ptr<clang::ModuleFileExtension>, std::allocator<std::shared_ptr<clang::ModuleFileExtension> > > const&) ()
   from /sw/atlas/sw/lcg/releases/LCG_101/ROOT/6.24.06/x86_64-centos7-gcc11-opt/lib/libCling.so
#5  0x00007fa7647e4b39 in cling::IncrementalParser::IncrementalParser(cling::Interpreter*, char const*, std::vector<std::shared_ptr<clang::ModuleFileExtension>, std::allocator<std::shared_ptr<clang::ModuleFileExtension> > > const&) () from /sw/atlas/sw/lcg/releases/LCG_101/ROOT/6.24.06/x86_64-centos7-gcc11-opt/lib/libCling.so
#6  0x00007fa764763a64 in cling::Interpreter::Interpreter(int, char const* const*, char const*, std::vector<std::shared_ptr<clang::ModuleFileExtension>, std::allocator<std::shared_ptr<clang::ModuleFileExtension> > > const&, bool, cling::Interpreter const*) ()
   from /sw/atlas/sw/lcg/releases/LCG_101/ROOT/6.24.06/x86_64-centos7-gcc11-opt/lib/libCling.so
#7  0x00007fa7646c4481 in TCling::TCling(char const*, char const*, char const* const*) ()
   from /sw/atlas/sw/lcg/releases/LCG_101/ROOT/6.24.06/x86_64-centos7-gcc11-opt/lib/libCling.so
#8  0x00007fa7646c63d1 in CreateInterpreter () from /sw/atlas/sw/lcg/releases/LCG_101/ROOT/6.24.06/x86_64-centos7-gcc11-opt/lib/libCling.so
#9  0x00007fa9eec623ac in TROOT::InitInterpreter() () from /sw/atlas/sw/lcg/releases/LCG_101/ROOT/6.24.06/x86_64-centos7-gcc11-opt/lib/libCore.so
#10 0x00007fa9eec6285f in ROOT::Internal::GetROOT2() () from /sw/atlas/sw/lcg/releases/LCG_101/ROOT/6.24.06/x86_64-centos7-gcc11-opt/lib/libCore.so
#11 0x00007fa9eed78009 in ROOT::TGenericClassInfo::GetClass() () from /sw/atlas/sw/lcg/releases/LCG_101/ROOT/6.24.06/x86_64-centos7-gcc11-opt/lib/libCore.so
#12 0x00007fa9ee760a76 in TH1C::Class() () from /sw/atlas/sw/lcg/releases/LCG_101/ROOT/6.24.06/x86_64-centos7-gcc11-opt/lib/libHist.so
#13 0x00007fa9f04e911a in OHRootProvider::convert (histogram=..., ann=...)

ROOT runs

LC_ALL=C g++   -xc++ -E -v /dev/null 2>&1 | sed -n -e '/^.include/,${' -e '/^ \/.*++/p' -e '}'

If that fails then you get the error you see. You said you get core files - if you attach gdb, does the core correspond to g++? Which g++ are we talking about here, where does it come from (CentOS 7 doesn’t have GCC 11.1 by itself)?

I run this command already and it worked fine. I think this explains why our application works fine as well most of the times. We took gcc 11.1 from the CERN LCG release. Here is the output of the above mentioned command:

 /sw/atlas/sw/lcg/releases/gcc/11.1.0-e80bf/x86_64-centos7/bin/../lib/gcc/x86_64-pc-linux-gnu/11.1.0/../../../../include/c++/11.1.0
 /sw/atlas/sw/lcg/releases/gcc/11.1.0-e80bf/x86_64-centos7/bin/../lib/gcc/x86_64-pc-linux-gnu/11.1.0/../../../../include/c++/11.1.0/x86_64-pc-linux-gnu
 /sw/atlas/sw/lcg/releases/gcc/11.1.0-e80bf/x86_64-centos7/bin/../lib/gcc/x86_64-pc-linux-gnu/11.1.0/../../../../include/c++/11.1.0/backward

It was suggested to me by someone that the problem could be related to the missing call to TThread::Initialize() function. Could this be the case? Is this explanation compatible with the fact that the application crashes very rarely? The application is multi-threaded.

Indeed, if you use ROOT in MT context you certainly want to call TThread::Initialize() before hitting the TROOT() construction.

(or maybe ROOT::EnableThreadSafety())

Thanks for all the suggestions. I’ll add TThread::Initialize() call to the main function of the application. Hopefully this should solve the problem.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.