Memory corruption in Cling

I run simple macro 0.C

TString opt,fil;
opt = "P2010a,btof,BEmcChkStat,Corr4,OSpaceZ2,OGridLeak3D";
opt+= ",Sti";
fil = "/star/rcf/test/daq/2010/029/st_physics_11029020_raw_1030002.daq";

gROOT->LoadMacro("bfc.C");
void *chain= (void*)bfc(-1,opt.Data(),fil.Data());

In gdb I set watch to "fil" , the name of file.
It goes to macro bfc.C, then to compiled classes
and then into:
iok = gSystem->Load(libL.Data());

And got memory corruption

Hardware watchpoint 2: *(char*)0x9d20b50
Old value = 47 '/'
New value = 64 '@'

Memory corruption in Cling

0x00007ffff0dcbf03 in _int_free () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff0dcbf03 in _int_free () from /lib64/libc.so.6
#1  0x00007fffeaa9dc4e in cling::IncrementalExecutor::runAndRemoveStaticDestructors(cling::Transaction*) ()
   from /afs/rhic.bnl.gov/star/packages/.DEV/root/installdir/lib64/libCling.so
#2  0x00007fffeaa326be in cling::Interpreter::unload(cling::Transaction&) ()

So name of file fil in 0.C was corrupted in _int_free. But this variable is not free yet
Is this well known bug?
Thank you,
Victor

Hi,

Can you explain what you observe, i.e. what made you debug? A crash? If so, do we have the backtrace?

(The memory behavior you see seems expected; though it would be interesting to see the earlier frames.)

I suspect that whatever happens is actually caused by the C++ error you have, declaring and assigning to a void variable, see Compiler Explorer

Cheers, Axel

Hi Axel

Can you explain what you observe, i.e. what made you debug? A crash? If so, do we have the backtrace?
In my macro was declared :
TString fil;
fil = “/star/rcf/test/daq/2010/029/st_physics_11029020_raw_1030002.daq”;
then I run:
gROOT->LoadMacro(“bfc.C”);
void chain= (void)bfc(-1,opt.Data(),fil.Data());

After working inside of compiled code, it was attempt to open file “fil”
But content of “fil” was changed and contains garbage. In result was error message
and stop. Using gdb set watch to the address of fil.Data(). And I got corruption of
this variable “fil”
So inside of gSystem->Load() in Cling part value of my variable changed to garbage.
This variable is not deleted and must be kept.

Full backtrace after corruption:

(gdb) watch (char)0x9d20b50
Hardware watchpoint 2: (char)0x9d20b50
(gdb) c
Continuing.
Hardware watchpoint 2: (char)0x9d20b50

Old value = 47 ‘/’
New value = 64 ‘@’
0x00007ffff0dcbf03 in _int_free () from /lib64/libc.so.6
(gdb) bt
#0 0x00007ffff0dcbf03 in _int_free () from /lib64/libc.so.6
#1 0x00007fffeaa9dc4e in cling::IncrementalExecutor::runAndRemoveStaticDestructors(cling::Transaction*) ()
from /afs/rhic.bnl.gov/star/packages/.DEV/root/installdir/lib64/libCling.so
#2 0x00007fffeaa326be in cling::Interpreter::unload(cling::Transaction&) ()
from /afs/rhic.bnl.gov/star/packages/.DEV/root/installdir/lib64/libCling.so
#3 0x00007fffeaa3280c in cling::Interpreter::unload(unsigned int) ()
from /afs/rhic.bnl.gov/star/packages/.DEV/root/installdir/lib64/libCling.so
#4 0x00007fffeaaf522c in cling::MetaSema::actOnUCommand(llvm::StringRef) ()
from /afs/rhic.bnl.gov/star/packages/.DEV/root/installdir/lib64/libCling.so
#5 0x00007fffeaaf5b3f in cling::MetaSema::actOnLCommand(llvm::StringRef, cling::Transaction**) ()
from /afs/rhic.bnl.gov/star/packages/.DEV/root/installdir/lib64/libCling.so
#6 0x00007fffeab0a0fb in cling::MetaParser::isCommand(cling::MetaSema::ActionResult&, cling::Value*) ()
from /afs/rhic.bnl.gov/star/packages/.DEV/root/installdir/lib64/libCling.so
#7 0x00007fffeaaf06a1 in cling::MetaProcessor::process(llvm::StringRef, cling::Interpreter::CompilationResult&, cling::Value*, bool) ()
from /afs/rhic.bnl.gov/star/packages/.DEV/root/installdir/lib64/libCling.so
#8 0x00007fffea97454a in HandleInterpreterException(cling::MetaProcessor*, char const*, cling::Interpreter::CompilationResult&, cling::Value*) ()
from /afs/rhic.bnl.gov/star/packages/.DEV/root/installdir/lib64/libCling.so
#9 0x00007fffea97f835 in TCling::Load(char const*, bool) ()
from /afs/rhic.bnl.gov/star/packages/.DEV/root/installdir/lib64/libCling.so
#10 0x00007ffff78ca320 in TSystem::Load(char const*, char const*, bool) ()
from /afs/rhic.bnl.gov/star/packages/.DEV/root/installdir/lib64/libCore.so
#11 0x00007fffd6475f9a in StBFChain::Load (this=0xb61a2c0)
at .sl73_x8664_gcc485/obj/StRoot/StBFChain/StBFChain.cxx:171
#12 0x00007fffe8a7681a in ?? ()
#13 0x0000000000000013 in ?? ()
#14 0x00007fffeb881c64 in llvm::StringMap<llvm::JITEvaluatedSymbol, llvm::MallocAllocator>::StringMap(llvm::StringMap<llvm::JITEvaluatedSymbol, llvm::MallocAllocator> const&) ()
from /afs/rhic.bnl.gov/star/packages/.DEV/root/installdir/lib64/libCling.so
#15 0x00007fffeb8830a2 in llvm::RuntimeDyldImpl::resolveExternalSymbols() ()
from /afs/rhic.bnl.gov/star/packages/.DEV/root/installdir/lib64/libCling.so
#16 0x000000000b7a2530 in ?? ()
#17 0x0000001100000014 in ?? ()
#18 0x0000000006f81f20 in ?? ()
#19 0x0000000000000000 in ?? ()
(gdb) Error detected on fd 0
error detected on stdin

Cheers,
Victor

What about this? (I’ll still want to debug the issue - but this should get you going.)

Hi Axel,

I suspect that whatever happens is actually caused by the C++ error
you have, declaring and assigning to a void variable, see Compiler Explorer
actually I do not see anything wrong in C++ :

void *chain= (void*)bfc(-1,opt.Data(),fil.Data());

bfc.C returns the pointer to class.
but to be sure I run simple:

bfc(-1,opt.Data(),fil.Data());

and got exactly the same result.

Cheers Victor

Hi Victor,

Sorry - the markdown formatting converted your void* chain into void chain and that triggered me (and would have triggered your compiler). I.e. understood - I will investigate! I’ll only get to it next week; please ping me on Wed should you not hear from me.

It’d be great if you could create a reproducer that shows this crash / invalid memory but doesn’t depend on your input files and that I can run. I.e. in your example I don’t have access to bfc.C.

Axel

Hi Axel, I would be happy to provide you simple example. Unfortunately all the simple examples
are working. And it is clear, that memory corruption in the file name from in Cling command inside
the deep cling code is the luck. In my case I run very simple macro, and got a corruption in very deep
of huge STAR code. The only way, which I see, to look with GDB in the Root code.
For me is important to know, that before nobody saw it.
I will try to play.
Victor

What does this load? Does it load the same code multiple times (I think so, actually)? This should probably be fixed to only load the code once. Here, re-loading seems to unload your code. That’s likely because of the following scenario:

  • StBFChain::Load() runs
  • you .x yourCode.C
  • which triggers a re-run of StBFChain::Load()
  • that unloads the previous “version” of whatever StBFChain::Load() loaded, and everything that was loaded afterwards (i.e. including yourCode.C)

Is that a reasonable hypothesis? :slight_smile:

Hi Axel,

What does this load? Does it load the same code multiple times (I think so, actually)? This should probably be fixed to only load the code once. Here, re-> > loading seems to unload your code. That’s likely because of the following scenario:

StBFChain::Load it is set of calls gSytem->Load(libname)
Yes, some times it is possible that gSytem->Load(libname) in other place.
But, as I remember from Root5 it was completely legal. When it was loaded twice, you got
return some return flag.

StBFChain::Load() runs
you .x yourCode.C
No StBFChain::Load() runs inside of .x yourCode.C and only once.
and macro cannot be reloaded

which triggers a re-run of StBFChain::Load()
that unloads the previous “version” of whatever StBFChain::Load() loaded, and everything that was loaded afterwards (i.e. including yourCode.C)
Is that a reasonable hypothesis? :slight_smile:
I think it is not the case.
Right now, due to some bureaucratic reorganisation I do not have
access to the code. When I will get access, I will provide more information.
Thank you very much,
Victor