Memory Leak Using Interpreter

Mere_Interest · January 26, 2017, 8:57pm

I am noticing a memory leak whenever I use the interpreter. I am watching the memory using with the command “watch grep Vm /proc/ROOT_PID_HERE/status” If I enter a non-empty command on the interpreter, then the memory usage increases by a few tens of kilobytes. I can make it increase continuously by using a TTimer, as follows.

TTimer* t = new TTimer(";",100); t->Start();

This is causing usability issues with some long-running processes. Is there a way to avoid this memory leak? I am using version 6.06.08, running on Debian 8.

Mere_Interest · January 26, 2017, 9:02pm

Also tested with version 6.08.00, and the same behavior occurs.

tpochep · January 26, 2017, 9:16pm

Oh, let me help you! Let’s start from the definition:

“In computer science, a memory leak is a type of resource leak that occurs when a computer program incorrectly manages memory allocations[1] in such a way that memory which is no longer needed is not released. In object-oriented programming, a memory leak may happen when an object is stored in memory but cannot be accessed by the running code.”

en.wikipedia.org/wiki/Memory_leak

Now, please, your turn - prove that what you see is a memory leak indeed: prove the memory cannot be accessed by the running code.

Mere_Interest · January 27, 2017, 2:41am

[ol]
[li] I have watched the memory usage of the program, using the /proc/status pseudofile. The memory usage of the program increases by 10-50 kB whenever a command is run through the interpreter. The command being run does not need to have any effect. For example, a line with a single semicolon. If a TTimer is started, then there is an increase in memory usage when the TTimer executes.[/li]
[li] The wikipedia link you so patronizingly gave states that a memory leak is when memory that is no longer needed is not released. The case of inaccessible memory is an example of how a memory leak may occur, not the only way that it can. Since memory is increasing without bound as the result of interpreting an empty statement, unnecessary memory is clearly being kept around, hence memory leak.[/li]
[li] Since C++ supports pointer arithmetic, even an exhaustive search through a core dump would be insufficient to show that a memory address is inaccessible, because the memory address might be formed at runtime.[/li]
[li] With that in mind, only finding the exact location of the unfreed memory would prove the existence of inaccessible memory. If I had found it, I would be submitting a patch, rather than a bug report. What I have found is strong evidence of a memory leak.[/li]
[li] The primary issue, as I mentioned, is with long-running processes. At the end of the day, I do not care whether the memory is accessible or not within the ROOT framework. The fact is that since memory usage continually increases, the computer goes into swap space, and the program must be terminated.[/li][/ol]

If you have any helpful suggestions, I would be happy to try them out.

As an addendum, I have tested with root 5.34, and the issue does not occur there, so this is an issue specifically with cling rather than cint.

ferhue · January 27, 2017, 4:05am

I can reproduce the problem. If I do this:

time valgrind --suppressions=$ROOTSYS/etc/valgrind-root.supp --leak-check=full --log-file=valgrind.log root.exe -n -l
TTimer* t = new TTimer(";",100);
t->Start();
//Go for a coffee
delete t;
.q

I don’t get any memory leak but a huge number of “still reachable bytes”, which is the problem you might see. It increases with time. So it seems a problem of a function of cling called by the TTimer::Notify method. To give some numbers, if I don’t wait at all, then I get just 16Mb reachable bytes. If I wait half an hour with the timer running, I get 100Mb still reachable. So after 1 day, 4.4 Gb full and the swap starts to fill…

So, an easy and dirty workaround to your problem would be… if possible, to close ROOT and restart again. Anyway, ROOT developers should look at the report with the option --show-reachable=yes. The relevant part of it is posted below. I suggest you to fill a bug report at root.cern.ch/bugs

==4402== 25,903,360 bytes in 11,564 blocks are still reachable in loss record 4,250 of 4,250
==4402==    at 0x4C2E0EF: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==4402==    by 0x7C33D9E: cling::IncrementalParser::beginTransaction(cling::CompilationOptions const&) (in /opt/root6/lib/libCling.so)
==4402==    by 0x7C34BC5: cling::IncrementalParser::commitTransaction(llvm::PointerIntPair<cling::Transaction*, 2u, cling::IncrementalParser::EParseResult, llvm::PointerLikeTypeTraits<cling::Transaction*>, llvm::PointerIntPairInfo<cling::Transaction*, 2u, llvm::PointerLikeTypeTraits<cling::Transaction*> > >&) (in /opt/root6/lib/libCling.so)
==4402==    by 0x7C37A55: cling::IncrementalParser::Compile(llvm::StringRef, cling::CompilationOptions const&) (in /opt/root6/lib/libCling.so)
==4402==    by 0x7BE72BD: cling::Interpreter::EvaluateInternal(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, cling::CompilationOptions, cling::Value*, cling::Transaction**, unsigned long) (in /opt/root6/lib/libCling.so)
==4402==    by 0x7BE766F: cling::Interpreter::process(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, cling::Value*, cling::Transaction**) (in /opt/root6/lib/libCling.so)
==4402==    by 0x7C707ED: cling::MetaProcessor::process(char const*, cling::Interpreter::CompilationResult&, cling::Value*) (in /opt/root6/lib/libCling.so)
==4402==    by 0x7B7AE45: HandleInterpreterException(cling::MetaProcessor*, char const*, cling::Interpreter::CompilationResult&, cling::Value*) (TCling.cxx:1874)
==4402==    by 0x7B8BCF9: TCling::ProcessLine(char const*, TInterpreter::EErrorCode*) (TCling.cxx:2040)
==4402==    by 0x5265CA5: TApplication::ProcessLine(char const*, bool, int*) (TApplication.cxx:1005)
==4402==    by 0x51B0630: TROOT::ProcessLine(char const*, int*) (TROOT.cxx:2178)
==4402==    by 0x52763BC: TTimer::Notify() (TTimer.cxx:148)

pcanal · January 27, 2017, 12:24pm

Hi,

This is indeed memory hoarding. When interpreting the line “;”, cling compiles and links in memory this code and we do not yet purge from memory the code segment that are not longer needed. What I mean is that if the code was a version of “int some_unique_variable_name = 3;” for “int some_unique_function_name() { … }” we have to keep the compiled version of this code around for ever while in your case (real no-op) and many case where the code is ‘r learly’ one-time use (like “var = 3;”), we should be able to free the memory once the underlying code has been executed. However, we have not had the resources to actually had this enhancement (which, if not done right, could trade-off memory-growth for incorrect behavior).

Thanks for your patience.
Philippe.

Mere_Interest · January 27, 2017, 2:47pm

Thank you for the explanation, and I can see how that would be a difficult problem to solve. In the case of TTimers, is it possible to reuse the same compiled code, since it will be the same for each execution of the TTimer?

My use-case is a GUI that periodically updates to display diagnostics, using a TTimer. Ideally, I would like this program to run continuously for several days, so the diagnostics can always be visible. With the memory hoarding as it is, the gui must be restarted every hour or two.

pcanal · January 27, 2017, 2:50pm

You are right. The Timer could (actually should) reuse the same compile code. I’ll take a look.

Cheers,
Philippe.

Mere_Interest · January 27, 2017, 2:55pm

Wait, never mind, the TTimer’s compiled code might be different based on the environment present in the interpreter with each run.

// func_int.C
int func(int) { return 1; }
// func_bool.C
int func(bool) { return 1000000; }
// In root interpreter
root [0] int i = 0;
root [1] .L func_int.C
root [2] TTimer* t = new TTimer("i += func(true);",100);
root [3] t->Start()
root [4] i
(int) 10
root [5] i
(int) 21
root [6] i
(int) 46
root [7] .L func_bool.C
root [8] i
(int) 11000089
root [9] i
(int) 20000089
root [10] i
(int) 32000089
root [11]

Once the better overload is present, then the TTimer switches to using it, because it recompiles each time the command is run. So, it looks like a partial fix for TTimers has the potential to break existing code.

Mere_Interest · February 7, 2017, 4:45pm

Has there been any progress on avoiding the memory hoarding? After measuring, my program’s memory usage is increasing by about 1.5 megabytes every second, which isn’t sustainable for more than short periods. I have tried to refactor my program to avoid the use of TTimer, but it wasn’t possible to do so.

ferhue · February 8, 2017, 12:38am

One idea: try to use this alternative

timer->Connect("Timeout()", "myObjectClassName", myObject, "TimerDone()");

And check if the hoarding is avoided with this strategy.

Mere_Interest · February 8, 2017, 7:04pm

Thank you very much, that method does avoid the memory leak.

Once the memory issue is solved, I’d like to switch back to the previous version. As it is, this requires an instance of a dummy class, complete with root dictionary, in order to call a simple function.