Having recently finished off some code that uses a TInterpreter to perform some runtime actions, everything seemed to be fine. But, after tidying up my code to remove all the debug chaff and shifting it to a new class, the code is still working, but the application now crashes on termination with:
Program received signal SIGSEGV, Segmentation fault.
0x00007fffe7c7ae91 in TClass::SetUnloaded() () from /home/skofl/sklib_gcc4.8.5/root_v5.28.00h/lib/libCore.so
Missing separate debuginfos, use: debuginfo-install bzip2-libs-1.0.6-13.el7.x86_64 freetype-2.8-14.el7.x86_64 glibc-2.17-307.el7.1.x86_64 libX11-1.6.7-2.el7.x86_64 libXau-1.0.8-2.1.el7.x86_64 libgcc-4.8.5-39.el7.x86_64 libgfortran-4.8.5-39.el7.x86_64 libpng-1.5.13-7.el7_2.x86_64 libquadmath-4.8.5-39.el7.x86_64 libstdc++-4.8.5-39.el7.x86_64 libxcb-1.13-1.el7.x86_64 ncurses-libs-5.9-14.20130511.el7_4.x86_64 nss-softokn-freebl-3.44.0-8.el7_7.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-18.el7.x86_64
(gdb) bt
#0 0x00007fffe7c7ae91 in TClass::SetUnloaded() () from /home/skofl/sklib_gcc4.8.5/root_v5.28.00h/lib/libCore.so
#1 0x00007fffe7c50aca in ROOT::RemoveClass(char const*) () from /home/skofl/sklib_gcc4.8.5/root_v5.28.00h/lib/libCore.so
#2 0x00007fffe7c51f30 in ROOT::TGenericClassInfo::~TGenericClassInfo() ()
from /home/skofl/sklib_gcc4.8.5/root_v5.28.00h/lib/libCore.so
#3 0x00007fffe08afce9 in __run_exit_handlers () from /lib64/libc.so.6
#4 0x00007fffe08afd37 in exit () from /lib64/libc.so.6
#5 0x00007fffe089855c in __libc_start_main () from /lib64/libc.so.6
#6 0x0000000000404d53 in _start ()
I can reproduce this with my development code if I delete my TInterpreter before the application closes (I had accidentally left it to leak before now). With the same code in a new (functionally identical) class I now get a crash even if I don’t delete my TInterpreter.
In a nutshell the application here is firing up a TInterpreter (TCint), loading a shared library that defines a class, and invoking some class methods.
Are there any obvious places to start? (Aside from recompiling ROOT in debug mode…)
My interpreted code invokes a templated method where the template class is only known at runtime. That means the specific template instantiation may not be present in the dictionary currently loaded by the interpreter. So, if necessary, I update the Linkdef file, rebuild the dictionary, and reload the library containing that dictionary, all at runtime. I tried using
Only by using my own TInterpreter instance and doing
I don’t recall that we really tested this in a while (in v5.34) …
So, if necessary, I update the Linkdef file, rebuild the dictionary,
Instead you could
write a ‘new’ linkdef file containing ‘only’ the pragma for the new function template instances. (note: don’t add a pragma for anything (including the class) that already has a (loaded) dictionary.
compile and link under a ‘unique’ library name
load that library.
and it should work. (If you want to make the change ‘permanent’, I guess you could also update the original LinkDef so that it is pick up/still used the next time you run.
Yep. Reloading is only needed when a new class is encountered at runtime - I just made sure no new classes were encountered and replaced the
meInterpreter = new TCint("meInterpreter", "title");
with
meInterpreter = (TCint*)gInterpreter;
So all its actually doing at this point is:
gInterpreter->Load("myLib.so")
gInterpreter->ProcessLine("some c stuff...")
i’m a bit busy at the moment, need to make some progress elsewhere, but i’ll try to investigate further into exactly what it’s doing now that triggers the segfault on termination.
OK, I came back to this today to try to narrow down what might have been causing the problems, and it seems that having made no changes (I wasn’t even working on this code repository) I’m not getting this error any more - either with the gInterpreter, with my own instance of TCint, with or without a single or multiple instances of dictionary rebuilding & reloading. Everything works like a charm.
So I’ll happily call this closed.
Thanks again to all who offered suggestions and help.
This may just be another incarnation of the random/arbitrary behavior (i.e. it might just be hiding out of luck and might reappear later). You can run your (previously failing) example under valgrind (valgrind --suppressions=$ROOTSYS/etc/valgrind-root.supp --leak-check=no your_execuable your_arguments) to see if it detect any undefined behavior (like use after deletion)