Now the reverse, please: first the call to TCling__DEBUG__decl_dump, then .L /home/dm232107/src/valjean/valjean/eponine/tripoli4/resources/depletion/DepletedComposition.C++O
Does that “fix” it or do you still see the error message "TClass and cling disagree on the size of the class TClassRef"?
(Thanks for bearing with me, this one is super weird, never saw it, and it’s not being reproduced by Fedora 34 as I had hoped.)
Sorry, not sure I follow. Should I still call TCling__DEBUG__decl_dump((void*)(gInterpreter->ClassInfo_Tagnum(TClass::GetClass("TClassRef")->GetClassInfo()))) or not?
Hello @Axel, I have made a little progress. I have managed to create a gcc 7.5.0 build that does not exhibit the TClass vs. cling disagreement.
I actually have two very similar CMake build directories now: one was created by Spack and suffers from the TClass/cling bug; the other was created by hand and does not suffer from the bug. However, I am completely puzzled because, as far as I can tell, the CMakeCache.txt files are extremely similar. The only differences that I can see are
the path to the build dir is different;
some internal davix, GSL and PC_LIBXML flags are different;
the CMAKE_ASM_COMPILER_AR and CMAKE_ASM_COMPILER_RANLIB variables point to different executables; the buggy build uses those from gcc 7.5.0, while the working build uses the ones from /usr/bin (which come from gcc 4.8.5). If anything, the buggy build seems to be making a more reasonable choice here. I also tried using the gcc 7.5.0 paths in the functional build, and it did not break it.
I even went so far as to use the Spack compiler wrappers in my external (functional) build, while sourcing the Spack environment file before building. This did not break the build. I also compared the list of compiler invocations performed by the two builds; I had to cut some corners in the comparison because there were too many false positives, but I couldn’t see anything out of the ordinary.
I am a bit at a loss about how to proceed from here. I foolishly tried to step through cling and llvm with gdb, but I realized that the data structures from llvm are just too big and I can’t make sense of them. Any suggestion is very much appreciated.
I should stress that the broken and the sane compiler are actually the same compiler in two slightly different environments.
However! I thought about running rootcling under strace and I have noticed that the broken build reads some include files from the gcc 4.8.5 directory in /usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../include/c++/4.8.5. This can’t be good, right? How do I check cling’s include path?
I see the broken one has /volatile/dm232107/src/root/spack-dev/.spack-env/view/include and /data/tmpdm2s/valjean/product/spack/opt/spack/linux-centos7-x86_64/gcc-7.5.0/gcc-7.5.0-p7zcri6tnqbglall2ki427c47vdkafyn/include as additional include paths. What are those? Can you break the working compiler by including them?
OK, to do that, let’s define “broken” first. In principle, this should give different output with the different compilers:
#include "TClassRef.h"
extern "C" int printf(const char*,...);
int main() {
printf("sizeof(TClassRef) = %d\n", sizeof(TClassRef));
return 0;
}
The view path is like a chroot jail with ROOT, gcc 7.5.0 and their dependencies. It is pretty clean, nothing to worry about there. The other path is the gcc 7.5.0’s installation path. I have tried adding these paths to CPATH before running rootcling but it did not break the sane executable. I can try to recompile and bake them into the CMake cache, but I don’t think we are on the right track there.
Somewhat surprisingly, the simple executable gives the same result in both compilation environments (40). This would be consistent with rootcling picking up some weird include path at run time.
I am trying to decrypt the strace output to understand where the gcc 4.8.5 path comes from. Does the rootcling process clone itself and talk to itself via a pipe?
$ # in the broken environment
$ PATH=/data/tmpdm2s/valjean/product/spack/lib/spack/env:$PATH rootcling test_dict.cxx /home/dm232107/src/valjean/valjean/eponine/tripoli4/resources/depletion/DepletedComposition.h
The strace output led me to realize that at some point rootcling calls the cc executable. In the broken environment, it finds /usr/bin/cc, which points to gcc 4.8.5. So basically it boils down to this:
$ # broken env
$ which cc
/usr/bin/cc
$ # sane env
$ which cc
/data/tmpdm2s/valjean/product/spack/lib/spack/env/cc
The latter is a wrapper script that somehow manages to call gcc 7.5.0, correctly.
So is this a ROOT bug or a Spack bug, in your opinion? I would argue that calling cc without any path is a bad practice. Can ROOT guarantee to call the compiler it was compiled with? I think in general this may not be possible because the compiler may not be available, but at least it could try to do so before emitting a warning and falling back to calling cc? What is your feeling about this?
Can you share your CMakeLists.txt? I’d like to see how ROOT was configured. It can be told to “just look for cc at runtime” - which seems to be the case here. Or it was told to use a cc that now doesn’t exist anymore. We’ll find out!
@Axel, what would be the build flag that tells ROOT not to look for cc at runtime? I am opening a PR in Spack. I tried GCC_INSTALL_PREFIX, but it didn’t help.
I came across this discussion because @arekfu linked it in a spack issue, and can confirm this is an issue with spack. In essence, the build environment of spack doesn’t directly expose the compiler paths, but uses a “wrapper” which is the thing you saw above: /data/tmpdm2s/valjean/product/spack/lib/spack/env/cc
This is completely fine as long as cc also refers to the right compiler at runtime when cling tries to get the include paths. But that is unfortunately not the case for all gcc installations, as you are not guaranteed to get a cc symlink to gcc in your path.
The very simple workaround is to fix your spack-installed gcc with a cc symlink to gcc. To save others this ordeal, I could create a cc symlink to the compiler and put it in the bin of the root installation.
@vavolkl there’s a way to tweak this, by setting -DCLING_CXX_PATH=whatever-compiler-to-use-by-cling-at-runtime. Maybe that’s better?
@arekfu if you look into your ROOT build directory, you should find a file called interpreter/cling/lib/Interpreter/cling-compiledata.h. That contains #define CLING_CXX_RLTV - whatever comes after is what cling will invoke at runtime to determine the C++ include paths. Is that just "cc" for you?