Library autoloading problem with 6.06.06

Hi.

I don’t know this problem is from ROOT or FairRoot package.
So I leave it here to get some advice from experts.

I’m using FairRoot to build our own software SpiRITROOT.
With the old version of FairRoot and ROOT 6.06.02, all went smoothly.
If I load something in SpiRITROOT, for example STDecoderTask class, it went okay without any error.

After I updated to the latest FairRoot and ROOT 6.06.06, the problem happened.
If I load the same class, STDecoderTask, it gives me error with almost no information like
Here, I raised STDecoderTask to explain easily, but loading any class in SpiRITROOT causes the problem.

 *** Break *** segmentation violation
#0  0x0000003494299dd5 in waitpid () from /lib64/libc.so.6
#1  0x000000349423c4a1 in do_system () from /lib64/libc.so.6
#2  0x00002af24b810ddf in TUnixSystem::StackTrace ()
   from /data/common/ricc/20160722.withV1.03/lib/root/libCore.so.6
#3  0x00007fff2aac6278 in ?? ()
#4  0x00002af24d0ad66a in clang::Sema::ActOnTag ()
   from /data/common/ricc/20160722.withV1.03/lib/root/libCling.so
#5  0x00002af24ced2142 in clang::Parser::ParseClassSpecifier ()
   from /data/common/ricc/20160722.withV1.03/lib/root/libCling.so
#6  0x000000002083c9b0 in ?? ()
#7  0x0000000000000000 in ?? ()

However, if I load any class in FairRoot, like FairLogger::GetLogger(),
only after that the classes in SpiRITROOT work fine.

If you have any idea, please reply.
Thank you in advance.

Genie

Hi Genie,

this is not expected. Did you compile the full stack based on ROOT 6.06 without leaving around any build remnants?
The order dependent behaviour you mention could be a symptom of some initialisation issue: is the library containing FairLogger::GetLogger() linked to SpirITROOT? Was it in the old build?

Cheers,
Danilo

Hi Danilo,

Thank you for the reply.
After reading your comment, I tried make inside the original build directory of ROOT 6.06.06.
Everything went well. There’s no uncompiled remnant at all.

The library containing FairLogger, which is FairRoot, are linked to SpiRITROOT.
It was okay before I upgraded to ROOT 6.06.06.

The problem is that this time I upgraded both FairRoot and ROOT 6.06.06 from 6.06.02.
So, it might be the problem of FairRoot.
For your information, there’s no uncompiled remnant in FairRoot
and whenever I tried to create objects with the classes contained in FairRoot package, no crash occurs.

The exact symptom is, ROOT crashes whenever I tried to load classes in SpiRITROOT before I load any class in FairRoot.

Best gegards,
Genie

Hi,

if all misconfiguration and mix of the two root versions are excluded, this looks like some initialisation ordering which might be implicitly assumed by fair root.
If the two libraries are linked, the loader will load FairRoot before SpiRITROOT by construction. Perhaps a look to the static initialisations present in the FairRoot library might help.

Cheers,
Danilo

Thank you, Danilo.

May I ask one last question?
How can I enable static library in FairRoot?
Please give me the answer only if you already know it.

I open this issue in the issue tracker of the FairRoot GitHub repository. :slight_smile:

Thank you again.

Genie

Hi Genie,

I think we are not looking for a static library here, the library can be shared. I was mentioning static initialisers, e.g. functions executed when the library is loaded.

Cheers,
Danilo

Thank you for the reply over and over.

I just found that the problem doesn’t occur on my mac,
while it occurs on the linux machine I’m trying to install.

Now I’m confused :frowning:

I’ll just tell people to use the macro with calling any FairRoot static instance like FairLogger::GetInstance()
because it doesn’t affect anything.

Thank you!
Genie

Hi Genie,

You may want to try with a debug build of ROOT to get more information. In addition you may want to run the failing example with valgrind (using the suppression files in $ROOTSYS/etc) to get some more information.

Cheers,
Philippe.

Thank you for the reply, Philippe.

I tried to build FairSoft with debug information, but it doesn’t give me more information.

Also, I tried valgrind with the suppression file and it gave me this.
It just doesn’t run.

==17282== Memcheck, a memory error detector.
==17282== Copyright (C) 2002-2006, and GNU GPL'd, by Julian Seward et al.
==17282== Using LibVEX rev 1658, a library for dynamic binary translation.
==17282== Copyright (C) 2004-2006, and GNU GPL'd, by OpenWorks LLP.
==17282== Using valgrind-3.2.1, a dynamic binary instrumentation framework.
==17282== Copyright (C) 2000-2006, and GNU GPL'd, by Julian Seward et al.
==17282== For more details, rerun with: -v
==17282== 
location should start with fun: or obj:
==17282== FATAL: in suppressions file '/data/common/ricc/sources/FairSoft.may16.source.debug/tools/root/etc/valgrind-root.supp': location should start with 'fun:' or 'obj:'
==17282== exiting now.

Because my failing example is simple as

STDecoderTask *a = new STDecoderTask()

I just tried without suppression file and it gave me this.

==17837== Memcheck, a memory error detector.
==17837== Copyright (C) 2002-2006, and GNU GPL'd, by Julian Seward et al.
==17837== Using LibVEX rev 1658, a library for dynamic binary translation.
==17837== Copyright (C) 2004-2006, and GNU GPL'd, by OpenWorks LLP.
==17837== Using valgrind-3.2.1, a dynamic binary instrumentation framework.
==17837== Copyright (C) 2000-2006, and GNU GPL'd, by Julian Seward et al.
==17837== For more details, rerun with: -v
==17837== 
root [0] 
Processing ha.C...

 *** Break *** segmentation violation
#0  0x0000003c31e99dd5 in waitpid () from /lib64/libc.so.6
#1  0x0000003c31e3c4a1 in do_system () from /lib64/libc.so.6
#2  0x00002af0604bbddf in ?? ()
   from /data/common/ricc/20160726.withV1.03.debug/lib/root/libCore.so.6
#3  0x00007fff87331958 in ?? ()
#4  0x00007fff87331130 in ?? ()
#5  0x0000000015b1b7c0 in ?? ()
#6  0x00000000150e39c0 in ?? ()
#7  0x0000000000000000 in ?? ()
Root > .q
==17837== 
==17837== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 5 from 1)
==17837== malloc/free: in use at exit: 0 bytes in 0 blocks.
==17837== malloc/free: 0 allocs, 0 frees, 0 bytes allocated.
==17837== For counts of detected errors, rerun with: -v
==17837== All heap blocks were freed -- no leaks are possible.

Hi,

[quote]valgrind-3.2.1[/quote]You need a newer version of valgrind (the current version is 3.11).

Cheers,
Philippe.

Hi,

The output is not as informative as I would expect. What command line did you use? Note that to use valgrind you need to use the ROOT executable actually named ‘root.exe’ rather than the convenience wrapper called ‘root’.

Cheers,
Philippe.

Thank you for the reply, Philippe.

I used the latest version of Valgrind and root.exe and rerun the same macro file to get the log below.

Thank you!

Genie

==12147== Memcheck, a memory error detector
==12147== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==12147== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==12147== Command: /data/common/ricc/20160726.withV1.03.debug/bin/root.exe ha.C
==12147== 
==12147== Conditional jump or move depends on uninitialised value(s)
==12147==    at 0x6C698F0: clang::ASTDeclReader::VisitFriendDecl(clang::FriendDecl*) (ASTReaderDecl.cpp:1639)
==12147==    by 0x6C6DB84: Visit (DeclNodes.inc:59)
==12147==    by 0x6C6DB84: clang::ASTDeclReader::Visit(clang::Decl*) (ASTReaderDecl.cpp:365)
==12147==    by 0x6C6E185: clang::ASTReader::ReadDeclRecord(unsigned int) (ASTReaderDecl.cpp:3151)
==12147==    by 0x6C1E0C4: clang::ASTReader::GetDecl(unsigned int) (ASTReader.cpp:6363)
==12147==    by 0x6C1F321: GetLocalDecl (ASTReader.h:1679)
==12147==    by 0x6C1F321: (anonymous namespace)::FindExternalLexicalDeclsVisitor::visit(clang::serialization::ModuleFile&, bool, void*) (ASTReader.cpp:6461)
==12147==    by 0x6CD1D0C: clang::serialization::ModuleManager::visitDepthFirst(bool (*)(clang::serialization::ModuleFile&, bool, void*), void*) (ModuleManager.cpp:427)
==12147==    by 0x6C1323E: clang::ASTReader::FindExternalLexicalDecls(clang::DeclContext const*, bool (*)(clang::Decl::Kind), llvm::SmallVectorImpl<clang::Decl*>&) (ASTReader.cpp:6478)
==12147==    by 0x715D406: FindExternalLexicalDecls (ExternalASTSource.h:172)
==12147==    by 0x715D406: clang::DeclContext::LoadLexicalDeclsFromExternalStorage() const (DeclBase.cpp:1014)
==12147==    by 0x715D4BE: clang::DeclContext::decls_begin() const (DeclBase.cpp:1109)
==12147==    by 0x71457D2: method_begin (DeclCXX.h:762)
==12147==    by 0x71457D2: methods (DeclCXX.h:756)
==12147==    by 0x71457D2: (anonymous namespace)::FinalOverriderCollector::Collect(clang::CXXRecordDecl const*, bool, clang::CXXRecordDecl const*, clang::CXXFinalOverriderMap&) (CXXInheritance.cpp:545)
==12147==    by 0x7145F84: clang::CXXRecordDecl::getFinalOverriders(clang::CXXFinalOverriderMap&) const (CXXInheritance.cpp:628)
==12147==    by 0x7249ED0: (anonymous namespace)::FinalOverriders::FinalOverriders(clang::CXXRecordDecl const*, clang::CharUnits, clang::CXXRecordDecl const*) (VTableBuilder.cpp:176)
==12147== 
   ------------------------------------------------------------
  | Welcome to ROOT 6.06/06                http://root.cern.ch |
  |                               (c) 1995-2016, The ROOT Team |
  | Built for linuxx8664gcc                                    |
  | From tag v6-06-06, 6 July 2016                             |
  | Try '.help', '.demo', '.license', '.credits', '.quit'/'.q' |
   ------------------------------------------------------------

root [0] 
Processing ha.C...
==12147== Invalid read of size 8
==12147==    at 0x14C64F96: FairLogger::GetLogger() (FairLogger.cxx:61)
==12147==    by 0x147AA708: FairContFact::FairContFact() (FairContFact.cxx:146)
==12147==    by 0x12C6C4FD: FairBaseContFact::FairBaseContFact() (FairBaseContFact.cxx:33)
==12147==    by 0x12C4FC1D: __static_initialization_and_destruction_0 (FairBaseContFact.cxx:30)
==12147==    by 0x12C4FC1D: _GLOBAL__sub_I_FairBaseContFact.cxx (FairBaseContFact.cxx:76)
==12147==    by 0x345940D4F2: call_init (in /lib64/ld-2.5.so)
==12147==    by 0x345940D5B4: _dl_init (in /lib64/ld-2.5.so)
==12147==    by 0x3459411053: dl_open_worker (in /lib64/ld-2.5.so)
==12147==    by 0x345940D135: _dl_catch_error (in /lib64/ld-2.5.so)
==12147==    by 0x34594108BB: _dl_open (in /lib64/ld-2.5.so)
==12147==    by 0x345A000F99: dlopen_doit (in /lib64/libdl-2.5.so)
==12147==    by 0x345940D135: _dl_catch_error (in /lib64/ld-2.5.so)
==12147==    by 0x345A00150C: _dlerror_run (in /lib64/libdl-2.5.so)
==12147==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==12147== 

 *** Break *** segmentation violation
#0  0x00000000380ec3bc in ?? ()
#1  0x0000000000000008 in ?? ()
#2  0x0000000803278e00 in ?? ()
#3  0x0000000803278dc0 in ?? ()
#4  0x00000008020083d0 in ?? ()
#5  0x000000000000003d in ?? ()
#6  0x000000080239a8c8 in ?? ()
#7  0x000000080239a910 in ?? ()
#8  0x000000000000003d in ?? ()
#9  0x0000000000000001 in ?? ()
#10 0x00000008020083c0 in ?? ()
#11 0x000000080239a880 in ?? ()
#12 0x000000003808e09d in ?? ()
#13 0x000000000000528f in ?? ()
#14 0x0000000038040c7e in ?? ()
#15 0x0000060100000260 in ?? ()
#16 0x0000000400000003 in ?? ()
#17 0x0000000000010400 in ?? ()
#18 0x0000000000000001 in ?? ()
#19 0x0000004800000010 in ?? ()
#20 0x0000002000000040 in ?? ()
#21 0x0000005000000060 in ?? ()
#22 0xffffffff00000058 in ?? ()
#23 0x00000000ffffffff in ?? ()
#24 0x000000003805cd11 in ?? ()
#25 0xfffffffffffbfa27 in ?? ()
#26 0x0000000000001f80 in ?? ()
#27 0x0000000803278f30 in ?? ()
#28 0x00000008020083c0 in ?? ()
#29 0x000000345983c461 in ?? ()
#30 0x00000008020083c0 in ?? ()
#31 0x0000000000001c00 in ?? ()
#32 0x0000000000000000 in ?? ()
Root > .q
==12147== Invalid read of size 8
==12147==    at 0x345F40F57C: ??? (in /lib64/libselinux.so.1)
==12147==    by 0x345F4045BE: ??? (in /lib64/libselinux.so.1)
==12147==    by 0x345F410280: ??? (in /lib64/libselinux.so.1)
==12147==    by 0x3459833254: exit (in /lib64/libc-2.5.so)
==12147==    by 0x50F2E7C: TUnixSystem::Exit(int, bool) (TUnixSystem.cxx:2137)
==12147==    by 0x4FF6DAE: TApplication::ProcessLine(char const*, bool, int*) (TApplication.cxx:869)
==12147==    by 0x4C22204: TRint::ProcessLineNr(char const*, char const*, int*) (TRint.cxx:745)
==12147==    by 0x4C22460: TRint::HandleTermInput() (TRint.cxx:605)
==12147==    by 0x50F7C04: TUnixSystem::CheckDescriptors() (TUnixSystem.cxx:1301)
==12147==    by 0x50F8D29: TUnixSystem::DispatchOneEvent(bool) (TUnixSystem.cxx:1056)
==12147==    by 0x504C275: TSystem::InnerLoop() (TSystem.cxx:407)
==12147==    by 0x504CE5F: TSystem::Run() (TSystem.cxx:357)
==12147==  Address 0x10 is not stack'd, malloc'd or (recently) free'd
==12147== 
==12147== 
==12147== Process terminating with default action of signal 11 (SIGSEGV)
==12147==  Access not within mapped region at address 0x10
==12147==    at 0x345F40F57C: ??? (in /lib64/libselinux.so.1)
==12147==    by 0x345F4045BE: ??? (in /lib64/libselinux.so.1)
==12147==    by 0x345F410280: ??? (in /lib64/libselinux.so.1)
==12147==    by 0x3459833254: exit (in /lib64/libc-2.5.so)
==12147==    by 0x50F2E7C: TUnixSystem::Exit(int, bool) (TUnixSystem.cxx:2137)
==12147==    by 0x4FF6DAE: TApplication::ProcessLine(char const*, bool, int*) (TApplication.cxx:869)
==12147==    by 0x4C22204: TRint::ProcessLineNr(char const*, char const*, int*) (TRint.cxx:745)
==12147==    by 0x4C22460: TRint::HandleTermInput() (TRint.cxx:605)
==12147==    by 0x50F7C04: TUnixSystem::CheckDescriptors() (TUnixSystem.cxx:1301)
==12147==    by 0x50F8D29: TUnixSystem::DispatchOneEvent(bool) (TUnixSystem.cxx:1056)
==12147==    by 0x504C275: TSystem::InnerLoop() (TSystem.cxx:407)
==12147==    by 0x504CE5F: TSystem::Run() (TSystem.cxx:357)
==12147==  If you believe this happened as a result of a stack
==12147==  overflow in your program's main thread (unlikely but
==12147==  possible), you can try to increase the size of the
==12147==  main thread stack using the --main-stacksize= flag.
==12147==  The main thread stack size used in this run was 10485760.
==12147== Invalid free() / delete / delete[] / realloc()
==12147==    at 0x4A07CA7: free (vg_replace_malloc.c:530)
==12147==    by 0x345990C1EA: free_mem (in /lib64/libc-2.5.so)
==12147==    by 0x345990BDE1: __libc_freeres (in /lib64/libc-2.5.so)
==12147==    by 0x4802601: _vgnU_freeres (vg_preloaded.c:65)
==12147==    by 0x1: ???
==12147==    by 0x345F410280: ??? (in /lib64/libselinux.so.1)
==12147==    by 0x3459833254: exit (in /lib64/libc-2.5.so)
==12147==    by 0x50F2E7C: TUnixSystem::Exit(int, bool) (TUnixSystem.cxx:2137)
==12147==    by 0x4FF6DAE: TApplication::ProcessLine(char const*, bool, int*) (TApplication.cxx:869)
==12147==    by 0x4C22204: TRint::ProcessLineNr(char const*, char const*, int*) (TRint.cxx:745)
==12147==    by 0x4C22460: TRint::HandleTermInput() (TRint.cxx:605)
==12147==    by 0x50F7C04: TUnixSystem::CheckDescriptors() (TUnixSystem.cxx:1301)
==12147==  Address 0x59ba5c0 is in a rw- anonymous segment
==12147== 
==12147== 
==12147== HEAP SUMMARY:
==12147==     in use at exit: 30,655,652 bytes in 56,568 blocks
==12147==   total heap usage: 248,202 allocs, 191,635 frees, 159,343,702 bytes allocated
==12147== 
==12147== LEAK SUMMARY:
==12147==    definitely lost: 2,849 bytes in 22 blocks
==12147==    indirectly lost: 9,572 bytes in 173 blocks
==12147==      possibly lost: 102 bytes in 1 blocks
==12147==    still reachable: 30,535,665 bytes in 54,822 blocks
==12147==                       of which reachable via heuristic:
==12147==                         stdstring          : 107,173 bytes in 789 blocks
==12147==                         newarray           : 10,040 bytes in 43 blocks
==12147==                         multipleinheritance: 4,552 bytes in 8 blocks
==12147==         suppressed: 107,464 bytes in 1,550 blocks
==12147== Rerun with --leak-check=full to see details of leaked memory
==12147== 
==12147== For counts of detected and suppressed errors, rerun with: -v
==12147== Use --track-origins=yes to see where uninitialised values come from
==12147== ERROR SUMMARY: 43 errors from 4 contexts (suppressed: 4195 from 109)
Segmentation fault

So the process is stumped at

Processing ha.C... ==12147== Invalid read of size 8 ==12147== at 0x14C64F96: FairLogger::GetLogger() (FairLogger.cxx:61) ==12147== by 0x147AA708: FairContFact::FairContFact() (FairContFact.cxx:146) ==12147== by 0x12C6C4FD: FairBaseContFact::FairBaseContFact() (FairBaseContFact.cxx:33) ==12147== by 0x12C4FC1D: __static_initialization_and_destruction_0 (FairBaseContFact.cxx:30) ==12147== by 0x12C4FC1D: _GLOBAL__sub_I_FairBaseContFact.cxx

This means that there is (likely in the FairROOT code) a ‘global variable’ of type ‘FairBaseContFact’ that is constructed via the default constructor. This default constructor seems to call the default constructor of FairContFact (likely a base class) which then try to log some information by using the FairLogger via the FairLogger::GetLogger but somehow, this is not yet initialized properly:

==12147== Invalid read of size 8 ==12147== at 0x14C64F96: FairLogger::GetLogger() (FairLogger.cxx:61)

This means that in the case where it works, something is loaded first that properly initialized the logger while in the failing case, it is not loaded/executed. Please refer this case to the developer/maintainer of FairLogger which may have some better hint on how to properly initialize the logger and/or how to enhance the code (for FairLogger) to survive this case (the problem is per se an order of initialization problem).

Cheers,
Philippe.

Thank you, Philippe.

I’ll report this to the FairRoot group!