TClass and cling disagree on the size of the class

Good, thanks!

Now the reverse, please: first the call to TCling__DEBUG__decl_dump, then .L /home/dm232107/src/valjean/valjean/eponine/tripoli4/resources/depletion/DepletedComposition.C++O

Does that “fix” it or do you still see the error message "TClass and cling disagree on the size of the class TClassRef"?

(Thanks for bearing with me, this one is super weird, never saw it, and it’s not being reproduced by Fedora 34 as I had hoped.)

Sorry, not sure I follow. Should I still call TCling__DEBUG__decl_dump((void*)(gInterpreter->ClassInfo_Tagnum(TClass::GetClass("TClassRef")->GetClassInfo()))) or not?

Nevermind, I think I got it. No, it does not fix it. Here is the output:

root [0] .rawInput
Using raw input
root [1] void TCling__DEBUG__decl_dump(void* D);
root [2] .rawInput
Not using raw input
root [3] TCling__DEBUG__decl_dump((void*)(gInterpreter->ClassInfo_Tagnum(TClass::GetClass("TClassRef")->GetClassInfo())));
CXXRecordDecl 0x2be2e00 </volatile/dm232107/spack/stage/spack-stage-root-6.22.06-5m5dji7jlm3usbupgo4mcatrivj2mtjd/spack-build-5m5dji7/include/TClassRef.h:28:1, line:76:1> line:28:7 imported referenced class TClassRef definition
|-DefinitionData standard_layout has_user_declared_ctor can_const_default_init
| |-DefaultConstructor exists non_trivial user_provided
| |-CopyConstructor non_trivial user_declared has_const_param needs_overload_resolution implicit_has_const_param
| |-MoveConstructor needs_overload_resolution
| |-CopyAssignment non_trivial has_const_param user_declared implicit_has_const_param
| |-MoveAssignment needs_overload_resolution
| `-Destructor non_trivial user_declared needs_overload_resolution
|-FieldDecl 0x2be2fc8 <line:31:4, col:18> col:18 imported referenced fClassName 'std::string':'class std::__cxx11::basic_string<char>'
| `-AnnotateAttr 0x2be3010 <col:31, col:55> R"ATTRDUMP(Name of referenced class)ATTRDUMP"
|-FieldDecl 0x2be30e0 <line:35:4, col:18> col:18 imported referenced fClassPtr 'class TClass *const *'
| `-AnnotateAttr 0x2be3128 <col:31, col:74> R"ATTRDUMP(! Ptr to the permanent TClass ptr/reference)ATTRDUMP"
`-<undeserialized declarations>
root [4] .L /home/dm232107/src/valjean/valjean/eponine/tripoli4/resources/depletion/DepletedComposition.C++O
Info in <TUnixSystem::ACLiC>: creating shared library /home/dm232107/src/valjean/valjean/eponine/tripoli4/resources/depletion/DepletedComposition_C.so
Error in <TInterpreter::InspectMembers>: TClass and cling disagree on the size of the class TBaseClass, respectively 152 128

Error in <TInterpreter::InspectMembers>: TClass and cling disagree on the size of the class TClassRef, respectively 40 16

Error in <TInterpreter::InspectMembers>: TClass and cling disagree on the size of the class TClassRef, respectively 40 16

And @Axel, thank you for taking the time to look into this!

OK I need to think about how this could possibly happen. Please ping me in 24h should thinking about this cause me to forget this :slight_smile:

Hello @Axel, I have made a little progress. I have managed to create a gcc 7.5.0 build that does not exhibit the TClass vs. cling disagreement.

I actually have two very similar CMake build directories now: one was created by Spack and suffers from the TClass/cling bug; the other was created by hand and does not suffer from the bug. However, I am completely puzzled because, as far as I can tell, the CMakeCache.txt files are extremely similar. The only differences that I can see are

  • the path to the build dir is different;
  • some internal davix, GSL and PC_LIBXML flags are different;
  • the CMAKE_ASM_COMPILER_AR and CMAKE_ASM_COMPILER_RANLIB variables point to different executables; the buggy build uses those from gcc 7.5.0, while the working build uses the ones from /usr/bin (which come from gcc 4.8.5). If anything, the buggy build seems to be making a more reasonable choice here. I also tried using the gcc 7.5.0 paths in the functional build, and it did not break it.

I even went so far as to use the Spack compiler wrappers in my external (functional) build, while sourcing the Spack environment file before building. This did not break the build. I also compared the list of compiler invocations performed by the two builds; I had to cut some corners in the comparison because there were too many false positives, but I couldn’t see anything out of the ordinary.

I am a bit at a loss about how to proceed from here. I foolishly tried to step through cling and llvm with gdb, but I realized that the data structures from llvm are just too big and I can’t make sense of them. Any suggestion is very much appreciated.

OK thanks for the info. Can you share the output of

echo | g++ -x c++ -v -fsyntax-only -

for the working and the broken compiler?

For the broken compiler: gcc-broken.txt (8.2 KB)
For the sane compiler: gcc-sane.txt (13.8 KB)

I should stress that the broken and the sane compiler are actually the same compiler in two slightly different environments.

However! I thought about running rootcling under strace and I have noticed that the broken build reads some include files from the gcc 4.8.5 directory in /usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../include/c++/4.8.5. This can’t be good, right? How do I check cling’s include path?

This looks sane:

root [0] gInterpreter->GetIncludePath()
(const char *) "-I"/data/tmpdm2s/valjean/product/spack/opt/spack/linux-centos7-x86_64/gcc-7.5.0/root-6.22.06-2r2j33aoili6qhwjwix5n7yonujbpagt/etc/" -I"/data/tmpdm2s/valjean/product/spack/opt/spack/linux-centos7-x86_64/gcc-7.5.0/root-6.22.06-2r2j33aoili6qhwjwix5n7yonujbpagt/etc//cling" -I"/data/tmpdm2s/valjean/product/spack/opt/spack/linux-centos7-x86_64/gcc-7.5.0/root-6.22.06-2r2j33aoili6qhwjwix5n7yonujbpagt/include/" -I"/data/tmpdm2s/valjean/product/spack/opt/spack/linux-centos7-x86_64/gcc-7.5.0/root-6.22.06-2r2j33aoili6qhwjwix5n7yonujbpagt/include" -I"/data/tmpdm2s/valjean/product/spack/opt/spack/linux-centos7-x86_64/gcc-7.5.0/pcre-8.44-vxtgtvw2dgvndoowbyguxmcw4jd7epfy/include""

I see the broken one has /volatile/dm232107/src/root/spack-dev/.spack-env/view/include and /data/tmpdm2s/valjean/product/spack/opt/spack/linux-centos7-x86_64/gcc-7.5.0/gcc-7.5.0-p7zcri6tnqbglall2ki427c47vdkafyn/include as additional include paths. What are those? Can you break the working compiler by including them?

OK, to do that, let’s define “broken” first. In principle, this should give different output with the different compilers:

#include "TClassRef.h"
extern "C" int printf(const char*,...);
int main() {
  printf("sizeof(TClassRef) = %d\n", sizeof(TClassRef));
  return 0;
}

Could you cross-check, please?

The view path is like a chroot jail with ROOT, gcc 7.5.0 and their dependencies. It is pretty clean, nothing to worry about there. The other path is the gcc 7.5.0’s installation path. I have tried adding these paths to CPATH before running rootcling but it did not break the sane executable. I can try to recompile and bake them into the CMake cache, but I don’t think we are on the right track there.

Somewhat surprisingly, the simple executable gives the same result in both compilation environments (40). This would be consistent with rootcling picking up some weird include path at run time.

I am trying to decrypt the strace output to understand where the gcc 4.8.5 path comes from. Does the rootcling process clone itself and talk to itself via a pipe?

I think I got it.

This works:

$ # in the broken environment
$ PATH=/data/tmpdm2s/valjean/product/spack/lib/spack/env:$PATH rootcling test_dict.cxx /home/dm232107/src/valjean/valjean/eponine/tripoli4/resources/depletion/DepletedComposition.h

The strace output led me to realize that at some point rootcling calls the cc executable. In the broken environment, it finds /usr/bin/cc, which points to gcc 4.8.5. So basically it boils down to this:

$ # broken env
$ which cc
/usr/bin/cc

$ # sane env
$ which cc
/data/tmpdm2s/valjean/product/spack/lib/spack/env/cc

The latter is a wrapper script that somehow manages to call gcc 7.5.0, correctly.

So is this a ROOT bug or a Spack bug, in your opinion? I would argue that calling cc without any path is a bad practice. Can ROOT guarantee to call the compiler it was compiled with? I think in general this may not be possible because the compiler may not be available, but at least it could try to do so before emitting a warning and falling back to calling cc? What is your feeling about this?

Can you share your CMakeLists.txt? I’d like to see how ROOT was configured. It can be told to “just look for cc at runtime” - which seems to be the case here. Or it was told to use a cc that now doesn’t exist anymore. We’ll find out!

Well done, @arekfu !

1 Like

I guess you mean the CMakeCache.txt file?. You’ll notice that I compiled llvm in Debug mode because I was trying to debug it.

Thank you for all your help Axel!

@Axel, what would be the build flag that tells ROOT not to look for cc at runtime? I am opening a PR in Spack. I tried GCC_INSTALL_PREFIX, but it didn’t help.

Hi all,

I came across this discussion because @arekfu linked it in a spack issue, and can confirm this is an issue with spack. In essence, the build environment of spack doesn’t directly expose the compiler paths, but uses a “wrapper” which is the thing you saw above: /data/tmpdm2s/valjean/product/spack/lib/spack/env/cc

This is completely fine as long as cc also refers to the right compiler at runtime when cling tries to get the include paths. But that is unfortunately not the case for all gcc installations, as you are not guaranteed to get a cc symlink to gcc in your path.

The very simple workaround is to fix your spack-installed gcc with a cc symlink to gcc. To save others this ordeal, I could create a cc symlink to the compiler and put it in the bin of the root installation.

1 Like

Thanks Valentin!

@vavolkl there’s a way to tweak this, by setting -DCLING_CXX_PATH=whatever-compiler-to-use-by-cling-at-runtime. Maybe that’s better?

@arekfu if you look into your ROOT build directory, you should find a file called interpreter/cling/lib/Interpreter/cling-compiledata.h. That contains #define CLING_CXX_RLTV - whatever comes after is what cling will invoke at runtime to determine the C++ include paths. Is that just "cc" for you?

Yes, almost:

$ cat interpreter/llvm/src/tools/cling/lib/Interpreter/cling-compiledata.h

    #define CLING_CXX_INCL "/data/tmpdm2s/valjean/product/spack/opt/spack/linux-centos7-x86_64/gcc-7.5.0/gcc-7.5.0-p7zcri6tnqbglall2ki427c47vdkafyn/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0:/data/tmpdm2s/valjean/product/spack/opt/spack/linux-centos7-x86_64/gcc-7.5.0/gcc-7.5.0-p7zcri6tnqbglall2ki427c47vdkafyn/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/x86_64-pc-linux-gnu:/data/tmpdm2s/valjean/product/spack/opt/spack/linux-centos7-x86_64/gcc-7.5.0/gcc-7.5.0-p7zcri6tnqbglall2ki427c47vdkafyn/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/backward"
    #define CLING_INCLUDE_PATHS ""
  
      #define CLING_CXX_RLTV "cc  -O3 "

… and the CLING_CXX_PATH trick seems to work! Thank you again @Axel!

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.