Install Root v5.34.34

So, it seems everything has been built fine and ldd finds all shared libraries.

Maybe @Axel or @pcanal find something “suspicious” on your “configure.out.txt” and / or “make.out.txt” files.

Otherwise, I’m afraid you will need to try to debug it yourself (stepping line by line; note: “root.exe”, not “root”):
gdb ${ROOTSYS}/bin/root.exe

BTW. Do not run “sort -u” when you create the “ldd_log.txt” file.

Hi, I tried gdb stepping, log file here (it went too long at the end and I gave up)
gdb_stepping.txt (259.3 KB)

One error that I see repetitively appear is it can’t seem to execute the buf = new char[buf_size]; in root/TError.cxx at v5-34-00-patches · root-project/root · GitHub because it can’t find some file under gcc 7.5.0 build on the machine:

operator new[] (sz=2048) at /gpfs/alpine/scratch/belhorn/stf007/builds/gcc-build-7.5.0-3/gcc-7.5.0/libstdc++-v3/libsupc++/new_opv.cc:32
32	/gpfs/alpine/scratch/belhorn/stf007/builds/gcc-build-7.5.0-3/gcc-7.5.0/libstdc++-v3/libsupc++/new_opv.cc: No such file or directory.
(gdb) 
operator new (sz=2048) at /gpfs/alpine/scratch/belhorn/stf007/builds/gcc-build-7.5.0-3/gcc-7.5.0/libstdc++-v3/libsupc++/new_op.cc:47
47	/gpfs/alpine/scratch/belhorn/stf007/builds/gcc-build-7.5.0-3/gcc-7.5.0/libstdc++-v3/libsupc++/new_op.cc: No such file or directory.

When I check this directory, it says permission denied, not sure how relevant this is:

It also looks like the first error message Error in <UnknownClass::ReadFile>: no file name specified came from the first ReadFile(s, kEnvGlobal); in this piece of code

Let me know if you have suggestions on further debugging. Thanks!

(Or how can I find out which function triggered the ErrorHandler function in TError.cxx, backtrace only stops at 5th frame

(gdb) backtrace 8
#0  0x00007ffff6413618 in raise () from /lib64/power9/libc.so.6
#1  0x00007ffff63f3a2c in abort () from /lib64/power9/libc.so.6
#2  0x00007ffff7861a48 in TUnixSystem::Abort (this=0x1005c650) at /autofs/nccs-svm1_home1/wshi/ROOT/v5-34-00-patches/core/unix/src/TUnixSystem.cxx:2223
#3  0x00007ffff7751eec in DefaultErrorHandler (level=773880864, abort_bool=100, location=0x206c732e207c2062 <error: Cannot access memory at address 0x206c732e207c2062>, 
    msg=0x207c206c642e207c <error: Cannot access memory at address 0x207c206c642e207c>) at /autofs/nccs-svm1_home1/wshi/ROOT/v5-34-00-patches/core/base/src/TError.cxx:193
#4  0x00007ffff775231c in ErrorHandler (level=<error reading variable: Cannot access memory at address 0x206e69207503>, 
    location=<error reading variable: Cannot access memory at address 0x206e6920750b>, fmt=<error reading variable: Cannot access memory at address 0x206e69207513>, 
    ap=<error reading variable: Cannot access memory at address 0x206e6920751b>) at /autofs/nccs-svm1_home1/wshi/ROOT/v5-34-00-patches/core/base/src/TError.cxx:245
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

)

Wei

The “.../new_op.cc: No such file or directory.” message is fine (“gdb” tries to access the original source code of the g++ library function when you “step in”).

It seems to me that you need to inspect the “TEnv::TEnv(const char *name)”. Check every call, what “name” it gets, and what happens inside (i.e., inspect its local variables and what it passes to “ReadFile”).
The first things to check:
root-config --etcdir
grep ROOTETCDIR ${ROOTSYS}/include/RConfigure.h

Here is the check:

[wshi@login4.summit v5-34-00-patches]$ source bin/thisroot.sh 
[wshi@login4.summit v5-34-00-patches]$ root-config --etcdir
/ccs/home/wshi/ROOT/v5-34-00-patches/etc
[wshi@login4.summit v5-34-00-patches]$ grep ROOTETCDIR ${ROOTSYS}/include/RConfigure.h
#define ROOTETCDIR    "$(ROOTSYS)/etc"
[wshi@login4.summit v5-34-00-patches]$ ls -l $ROOTSYS/etc
total 1376
-rw-rw-r--  1 wshi wshi   6745 Nov  3 12:53 HistFactorySchema.dtd
-rw-rw-r--  1 wshi wshi  12039 Nov  3 12:53 Makefile.arch
-rw-rw-r--  1 wshi wshi 831016 Nov  3 12:53 RadioNuclides.txt
-rw-rw-r--  1 wshi wshi    918 Nov  3 12:53 class.rules
drwxrwxr-x  2 wshi wshi   4096 Nov  3 12:53 cmake
drwxrwxr-x  2 wshi wshi   4096 Nov  3 13:07 daemons
-rwxrwxr-x  1 wshi wshi   7607 Nov  3 12:53 gdb-backtrace.sh
-rw-rw-r--  1 wshi wshi     69 Nov  3 13:18 gitinfo.txt
-rw-rw-r--  1 wshi wshi   7687 Nov  3 12:53 helgrind-root.supp
-rw-rw-r--  1 wshi wshi   1709 Nov  3 12:53 hostcert.conf
drwxrwxr-x  2 wshi wshi   4096 Nov  3 12:53 html
drwxrwxr-x  6 wshi wshi   4096 Nov  3 12:53 http
-rw-rw-r--  1 wshi wshi 375364 Nov  3 12:53 pdg_table.txt
drwxrwxr-x 59 wshi wshi   8192 Nov  3 12:53 plugins
drwxrwxr-x  4 wshi wshi   4096 Nov  3 12:53 proof
-rw-rw-r--  1 wshi wshi    303 Nov  3 12:53 root.desktop
-rw-rw-r--  1 wshi wshi   7923 Nov  3 13:07 root.mimes
-rw-rw-r--  1 wshi wshi  13017 Nov  3 12:53 system.plugins-ios
-rw-rw-r--  1 wshi wshi  12697 Nov  3 13:07 system.rootauthrc
-rw-rw-r--  1 wshi wshi   4246 Nov  3 13:07 system.rootdaemonrc
-rw-rw-r--  1 wshi wshi  32811 Nov  3 13:07 system.rootrc
-rw-rw-r--  1 wshi wshi  14551 Nov  3 12:53 valgrind-root.supp
drwxrwxr-x  2 wshi wshi   4096 Nov  3 12:53 vmc

This output seems fine.

Now stepping in TEnv::TEnv, right before the function ReadFile(s, kEnvGlobal), the char pointer char *s = gSystem->ConcatFileName(etc, sname); is pointing to a 0 value.

The etc and sname values are below, there seems to be some memory error in sname.

(gdb) p etc
$39 = {_vptr.TString = 0x7ffff7f17bb0 <vtable for TString+16>, fRep = {{fLong = {fCap = 1697590789, fSize = 25460, fData = 0x1005ef10 "8\026\362\367\377\177"}, fShort = {
        fSize = 5 '\005', fData = "./etc\000\000\020\357\005\020\000\000\000"}, fRaw = {fWords = {1697590789, 25460, 268824336, 0}}}}, static fgIsA = {_M_b = {_M_p = 0x0}}}
(gdb) p sname
$40 = {_vptr.TString = 0x7ffff7f17bb0 <vtable for TString+16>, fRep = {{fLong = {fCap = 1937339149, fSize = 778921332, 
        fData = 0xff006372746f6f72 <error: Cannot access memory at address 0xff006372746f6f72>}, fShort = {fSize = 13 '\r', fData = "system.rootrc\000\377"}, fRaw = {fWords = {
          1937339149, 778921332, 1953460082, -16751758}}}}, static fgIsA = {_M_b = {_M_p = 0x0}}}
(gdb) p s
$41 = 0x1005eef0 ""
(gdb) x 0x1005eef0
0x1005eef0:	0x00000000
(gdb) p !s      
$42 = false

Full log here, fyi:
gdb_stepping_TEnv.txt (49.1 KB)

It executed these lines in TEnv::TEnv, I put the gdb value check below:

fIgnoreDup = kFALSE;

fTable  = new THashList(1000);
fRcName = name;

# (gdb) p name
# $21 = 0x7ffff7d7f0e0 ".rootrc"

TString sname = "system";
sname += name;

# (gdb) p name
# $32 = 0x7ffff7d7f0e0 ".rootrc"
# (gdb) p sname
# $33 = {_vptr.TString = 0x7ffff7f17bb0 <vtable for TString+16>, fRep = {{fLong = {fCap = 1937339142, fSize = 7169396, 
        fData = 0xfffffffffeff0000 <error: Cannot access memory at address 0xfffffffffeff0000>}, fShort = {fSize = 6 '\006', 
        fData = "system\000\000\000\377\376\377\377\377\377"}, fRaw = {fWords = {1937339142, 7169396, -16842752, -1}}}}, static fgIsA = {_M_b = {_M_p = 0x0}}}

TString etc = gRootDir;

# (gdb) p gRootDir
# $34 = 0x1005db38 "."

etc += "/etc";

# (gdb) p etc
# $38 = {_vptr.TString = 0x7ffff7f17bb0 <vtable for TString+16>, fRep = {{fLong = {fCap = -16765439, fSize = 32767, fData = 0x1005ef10 "8\026\362\367\377\177"}, fShort = {
        fSize = 1 '\001', fData = ".\000\377\377\177\000\000\020\357\005\020\000\000\000"}, fRaw = {fWords = {-16765439, 32767, 268824336, 0}}}}, static fgIsA = {_M_b = {
      _M_p = 0x0}}}

char *s = gSystem->ConcatFileName(etc, sname);

Please advise. This seems to point to some memory issue.

This output is difficult to analyze.
Do not “step” into all these different method calls. Just go through “next” lines in “TEnv::TEnv”.
Also, for test purposes, “cd /tmp” before running “gdb” (so that you are NOT in the “${ROOTSYS}” directory).

BTW. The “Cannot access memory at address” warnings are fine for the TString (which in these cases uses its “fShort” instead of “fLong”).

Here is the short version with next lines:
gdb_stepping_short_TEnv.txt (5.5 KB)

It looks like “gRootDir” is just “.”, which seems wrong to me. I think it should be the full “/ccs/home/wshi/ROOT/v5-34-00-patches” path (compare also “echo ${ROOTSYS}” and “root-config --prefix”).
Try to find the place where it is initialized.

Ok, the compare looks the same

[wshi@login2.summit tmp]$ pwd
/tmp
[wshi@login2.summit tmp]$ echo ${ROOTSYS}
/ccs/home/wshi/ROOT/v5-34-00-patches
[wshi@login2.summit tmp]$ root-config --prefix
/ccs/home/wshi/ROOT/v5-34-00-patches

In the following code in TEnv.cxx, to me it seems it should execute the ifdef block, but when print ROOTETCDIR, it says No symbol "ROOTETCDIR" in current context.. Not sure if it’s relevant.

#ifdef ROOTETCDIR
      char *s = gSystem->ConcatFileName(ROOTETCDIR, sname);
#else
      TString etc = gRootDir;

Apparently, the “ROOTETCDIR” was not defined when the software was compiled, but it may be defined now:
grep ROOTETCDIR ${ROOTSYS}/include/RConfigure.h

In any case, “gRootDir” should have a proper value.

I can see gRootDir got the “.” value in gRootDir = Getenv("ROOTSYS") at root/TUnixSystem.cxx at v5-34-00-patches · root-project/root · GitHub
The following stepping shows this:

Breakpoint 6, TUnixSystem::Init (this=0x1005c650) at /autofs/nccs-svm1_home1/wshi/ROOT/v5-34-00-patches/core/unix/src/TUnixSystem.cxx:619
......
(gdb) p gRootDir
$8 = 0x0
(gdb) next
645	   SetRootSys();
(gdb) p HAVE_DLADDR
No symbol "HAVE_DLADDR" in current context.
(gdb) p gRootDir
$9 = 0x0
(gdb) next
649	   gRootDir = Getenv("ROOTSYS");
(gdb) p gRootDir
$10 = 0x0
(gdb) p ROOTSYS
No symbol "ROOTSYS" in current context.
(gdb) next
650	   if (gRootDir == 0)
(gdb) p gRootDir
$11 = 0x1005db38 "."
(gdb) p ROOTPREFIX
No symbol "ROOTPREFIX" in current context.
(gdb) next
656	   return kFALSE;
(gdb) p gRootDir
$12 = 0x1005db38 "."

But I can’t seem to step in the getenv function

Breakpoint 7, TUnixSystem::Getenv (this=0x1005c650, name=0x7ffff7d94ce8 "ROOTSYS") at /autofs/nccs-svm1_home1/wshi/ROOT/v5-34-00-patches/core/unix/src/TUnixSystem.cxx:2154
2154	   return ::getenv(name);
(gdb) p name
$17 = 0x7ffff7d94ce8 "ROOTSYS"
(gdb) x 0x7ffff7d94ce8
0x7ffff7d94ce8:	0x544f4f52
(gdb) x 0x544f4f52
0x544f4f52:	Cannot access memory at address 0x544f4f52
(gdb) p ROOTSYS
No symbol "ROOTSYS" in current context.
(gdb) step
2155	}

So, it seems that “ROOTSYS” gets overwritten when you execute “root.exe” (or “getenv” returns “.” instead of the proper value).
It seems there are two functions that may modify “ROOTSYS”: “SetRootSys” and “DylibAdded

After checking with show environment ROOTSYS, I think it is not overwritten. So that suggests “getenv ” returns “. ”.
The following shows this. The problem is I can’t step through getenv (a C library function) to understand what happened inside.

Breakpoint 1, TUnixSystem::Getenv (this=0x1005c650, name=0x7ffff7d94ce8 "ROOTSYS") at /autofs/nccs-svm1_home1/wshi/ROOT/v5-34-00-patches/core/unix/src/TUnixSystem.cxx:2154
2154	   return ::getenv(name);
(gdb) p name
$1 = 0x7ffff7d94ce8 "ROOTSYS"
(gdb) x 0x7ffff7d94ce8
0x7ffff7d94ce8:	0x544f4f52
(gdb) x 0x544f4f52
0x544f4f52:	Cannot access memory at address 0x544f4f52
(gdb) p gRootDir
$2 = 0x0
(gdb) p ROOTSYS
No symbol "ROOTSYS" in current context.
(gdb) p "ROOTSYS"
$3 = "ROOTSYS"
(gdb) show paths
Executable and object file path: /ccs/home/wshi/ROOT/v5-34-00-patches/bin:/sw/summit/xalt/1.2.1/bin:/sw/sources/lsf-tools/2.0/summit/bin:/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-7.5.0/spectrum-mpi-10.4.0.3-20210112-puowkoejepfjtm22sk2dxb6eeup5w447/bin:/sw/summit/gcc/7.5.0-2/bin:/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-8.3.1/darshan-runtime-3.3.0-mu6tnxlhxfplrq3srkkgi5dvly6wenwy/bin:/sw/sources/hpss/bin:/opt/ibm/spectrumcomputing/lsf/10.1.0.11/linux3.10-glibc2.17-ppc64le-csm/etc:/opt/ibm/spectrumcomputing/lsf/10.1.0.11/linux3.10-glibc2.17-ppc64le-csm/bin:/opt/ibm/csm/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/ibm/flightlog/bin:/opt/ibm/jsm/bin:/sw/sources/cgroup_tool/bin:/opt/puppetlabs/bin:/usr/lpp/mmfs/bin
(gdb) show environment ROOTSYS
ROOTSYS = /ccs/home/wshi/ROOT/v5-34-00-patches
(gdb) step
2155	}
(gdb) 
TUnixSystem::Init (this=0x1005c650) at /autofs/nccs-svm1_home1/wshi/ROOT/v5-34-00-patches/core/unix/src/TUnixSystem.cxx:650
650	   if (gRootDir == 0)
(gdb) p gRootDir
$4 = 0x1005db38 "."
(gdb) show environment ROOTSYS
ROOTSYS = /ccs/home/wshi/ROOT/v5-34-00-patches

I also find that one of the errors I saw before Error in <TUnixSystem::StackTrace> script ./etc/gdb-backtrace.sh is missing is also caused by Getenv("ROOTSYS") at root/TUnixSystem.cxx at v5-34-00-patches · root-project/root · GitHub

So do the error due to GetDynamicPath giving empty string at root/TUnixSystem.cxx at v5-34-00-patches · root-project/root · GitHub

Error in <TUnixSystem::DynamicPathName>: libMathCore[.so | .dll | .dylib | .sl | .dl | .a] does not exist in 
aborting

I think, “show environment” gives you the value that belongs to “gdb”, not the environment of the program being debugged (so, if “root.exe” modifies it, “gdb” will not notice).

Is there a way to check the value of ROOTSYS in gdb then? I tried the following, but none of them tells me what is ROOTSYS.

(gdb) p "ROOTSYS"
$22 = "ROOTSYS"
(gdb) p ROOTSYS
No symbol "ROOTSYS" in current context.

In any case, both functions you mentioned (“SetRootSys ” and “DylibAdded ”) that may modify “ROOTSYS ”, they both execute this line:

gSystem->Setenv("ROOTSYS", gSystem->DirName(rs));

where I can see gSystem->DirName(rs) returns the “.” value for ROOTSYS. See the log here:

645	   SetRootSys();
(gdb) step
SetRootSys () at /autofs/nccs-svm1_home1/wshi/ROOT/v5-34-00-patches/core/unix/src/TUnixSystem.cxx:496
496	   void *addr = (void *)SetRootSys;
(gdb) next
498	   if (dladdr(addr, &info) && info.dli_fname && info.dli_fname[0]) {
(gdb) 
500	      if (!realpath(info.dli_fname, respath)) {
(gdb) p respath
$39 = '\000' <repeats 2049 times>...
(gdb) p rs
No symbol "rs" in current context.
(gdb) next
504	         TString rs = gSystem->DirName(respath);
(gdb) p respath
$40 = "/autofs/nccs-svm1_home1/wshi/ROOT/v5-34-00-patches/lib/libCore.so.5.34", '\000' <repeats 1979 times>...
(gdb) p rs
$41 = {_vptr.TString = 0x0, fRep = {{fLong = {fCap = 0, fSize = 0, fData = 0x0}, fShort = {fSize = 0 '\000', fData = '\000' <repeats 14 times>}, fRaw = {fWords = {0, 0, 0, 
          0}}}}, static fgIsA = {_M_b = {_M_p = 0x0}}}
(gdb) next
505	         gSystem->Setenv("ROOTSYS", gSystem->DirName(rs));
(gdb) p rs
$42 = {_vptr.TString = 0x7ffff7f17bb0 <vtable for TString+16>, fRep = {{fLong = {fCap = -2147483584, fSize = 54, 
        fData = 0x1005dae0 "/autofs/nccs-svm1_home1/wshi/ROOT/v5-34-00-patches/lib"}, fShort = {fSize = 64 '@', 
        fData = "\000\000\200\066\000\000\000\340\332\005\020\000\000\000"}, fRaw = {fWords = {-2147483584, 54, 268819168, 0}}}}, static fgIsA = {_M_b = {_M_p = 0x0}}}
(gdb) p respath
$43 = "/autofs/nccs-svm1_home1/wshi/ROOT/v5-34-00-patches/lib/libCore.so.5.34", '\000' <repeats 1979 times>...
(gdb) step
TString::operator char const* (this=0x7fffffff9808) at include/TString.h:284
284	   operator const char*() const { return GetPointer(); }
(gdb) 
TString::GetPointer (this=0x7fffffff9808) at include/TString.h:235
235	   const char    *GetPointer() const { return IsLong() ? GetLongPointer() : GetShortPointer(); }
(gdb) 
TString::IsLong (this=0x7fffffff9808) at include/TString.h:216
216	   Bool_t         IsLong() const { return Bool_t(fRep.fShort.fSize & kShortMask); }
(gdb) 
TString::GetShortPointer (this=0x7fffffff9808) at include/TString.h:233
233	   const char    *GetShortPointer() const { return fRep.fShort.fData; }
(gdb) 
TSystem::DirName (this=0x1005c650, pathname=0x7fffffff9811 "") at /autofs/nccs-svm1_home1/wshi/ROOT/v5-34-00-patches/core/base/src/TSystem.cxx:984
984	   if (pathname && strchr(pathname, '/')) {
(gdb) p pathname
$44 = 0x7fffffff9811 ""
(gdb) p rs
No symbol "rs" in current context.
(gdb) next
1015	   return ".";
(gdb) 
1016	}
(gdb) 
SetRootSys () at /autofs/nccs-svm1_home1/wshi/ROOT/v5-34-00-patches/core/unix/src/TUnixSystem.cxx:504
504	         TString rs = gSystem->DirName(respath);
(gdb) 
511	}

So the rs variable seems to have the correct path “/autofs/nccs-svm1_home1/wshi/ROOT/v5-34-00-patches/lib”, but when passed to DirName(rs), it becomes an empty string pathname=0x7fffffff9811 ""

Looking at this output, it seems that at “$42” the “rs” has a proper value in “fLong” (“fShort” is empty), but then, several lines below, inside of “gSystem->DirName(rs)”, at “233” the “TString::GetShortPointer” is called. It seems that the “TString::IsLong” at "216 misbehaved.

@Axel / @pcanal Can it be that the problem originates in “R__BYTESWAP”? Is something missing in “core/base/inc/RConfig.h” for the “linuxppc64gcc” target?

@weishi A desperate trial …start from scratch again (in a different directory so that you do not destroy your current setup), and before running “./configure”, modify “core/base/inc/RConfig.h” adding one line inside (before the line 327 with “R__ppc64”):

#   define R__BYTESWAP
#   if defined(R__ppc64)
1 Like

Probably unrelated, but clang static analyzer finds some issues with the TString::IsLong function, see:

@ferhue We are dealing here with the “v5-34-00-patches” branch.

I know, but the same bug might affect 5.34