Root freeze problem

Ok,
I managed to reproduce the problem.
I wrote an absolutely arbitrary piece of code which is allocating and deallocating a vector of Int_t.
I do an extremely large cycle and I obtain something very similar to what shown before.
root freezes after 5 minutes and the backtrace is shown later.
Please keep in mind that this code is not related to any of my previous macros.
My idea was just to do a large number of allocations and deallocations of memory because I felt this is a point where my root is having troubles.
(not sure this is the correct interpretation but nevertheless I get the same behaviour and backtrace)
Thanks a lot!
Francesco

#include “TROOT.h”
#include
#include
#include “TH1F.h”
#include
void TestMacroDebug(Float_t normFlag){
gROOT->Reset();
for(Float_t i=0;i<normFlag;i++)
{
vector<Int_t> cat;
for(Int_t ii=0;ii<10;ii++)
cat.push_back(ii);
if(i==1e7)
{
cout<<i<<endl;
for(Int_t ii=0;ii<10;ii++)
cout<<cat[ii]<<endl;
}
cat.erase(cat.begin(),cat.begin()+10);
}
}

root [0] .L TestMacroDebug.cc+
Info in TUnixSystem::ACLiC: creating shared library /mnt/data/corsika/v740/corsika-74000/run/./TestMacroDebug_cc.so
root [1] TestMacroDebug(1e9)
1e+07
0
1
2
3
4
5
6
7
8
9
^C
Program received signal SIGINT, Interrupt.
__lll_lock_wait_private () at …/nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
95 …/nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: No such file or directory.
(gdb) backtrace
#0 __lll_lock_wait_private () at …/nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
#1 0x00007ffff6b4fdca in _L_lock_12779 () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007ffff6b4d7a5 in __GI___libc_malloc (bytes=537) at malloc.c:2887
#3 0x00007ffff7322dad in operator new(unsigned long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4 0x00007ffff737e209 in std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator const&) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5 0x00007ffff737edcb in std::string::_Rep::_M_clone(std::allocator const&, unsigned long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6 0x00007ffff737ee64 in std::string::reserve(unsigned long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#7 0x00007ffff735d1d6 in std::basic_stringbuf<char, std::char_traits, std::allocator >::overflow(int) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#8 0x00007ffff73613f6 in std::basic_streambuf<char, std::char_traits >::xsputn(char const*, long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#9 0x00007ffff734d41d in std::ostreambuf_iterator<char, std::char_traits > std::num_put<char, std::ostreambuf_iterator<char, std::char_traits > >::_M_insert_int(std::ostreambuf$
#10 0x00007ffff734d60d in std::num_put<char, std::ostreambuf_iterator<char, std::char_traits > >::do_put(std::ostreambuf_iterator<char, std::char_traits >, std::ios_base&, char, unsigned long)$
#11 0x00007ffff73592ce in std::ostream& std::ostream::_M_insert(unsigned long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#12 0x00007ffff7ab3cc8 in textinput::TerminalDisplayUnix::HandleResizeSignal() () from /mnt/data/root_6.04_06/root-6.04.06//lib/libCore.so
#13
#14 0x00007ffff6b4b32d in _int_malloc (av=0x7ffff6e89760 <main_arena>, bytes=16) at malloc.c:3355
#15 0x00007ffff6b4d7b0 in __GI___libc_malloc (bytes=16) at malloc.c:2891
#16 0x00007ffff7322dad in operator new(unsigned long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#17 0x00007ffff79a1b44 in void std::vector<int, std::allocator >::_M_emplace_back_aux<int const&>(int const&) () from /mnt/data/root_6.04_06/root-6.04.06//lib/libCore.so
#18 0x00007fffeb7c14ea in TestMacroDebug(float) () from /mnt/data/corsika/v740/corsika-74000/run/TestMacroDebug_cc.so
#19 0x00007fffe9169050 in ?? ()
#20 0x0000000001794c18 in ?? ()
#21 0x00007fffffffba90 in ?? ()
#22 0x00007fffffffb810 in ?? ()
#23 0x00007ffff362d324 in cling::Interpreter::RunFunction(clang::FunctionDecl const*, cling::Value*) () from /mnt/data/root_6.04_06/root-6.04.06//lib/libCling.so
#24 0x00007ffff3630e4a in cling::Interpreter::EvaluateInternal(std::string const&, cling::CompilationOptions, cling::Value*, cling::Transaction**) ()
from /mnt/data/root_6.04_06/root-6.04.06//lib/libCling.so
#25 0x00007ffff3630fa3 in cling::Interpreter::process(std::string const&, cling::Value*, cling::Transaction**) () from /mnt/data/root_6.04_06/root-6.04.06//lib/libCling.so
#26 0x00007ffff366da73 in cling::MetaProcessor::process(char const*, cling::Interpreter::CompilationResult&, cling::Value*) () from /mnt/data/root_6.04_06/root-6.04.06//lib/libCling.so
#27 0x00007ffff3546406 in TCling::ProcessLine(char const*, TInterpreter::EErrorCode*) () from /mnt/data/root_6.04_06/root-6.04.06//lib/libCling.so
#28 0x00007ffff79a62e0 in TApplication::ProcessLine(char const*, bool, int*) () from /mnt/data/root_6.04_06/root-6.04.06//lib/libCore.so
#29 0x00007ffff75d800f in TRint::ProcessLineNr(char const*, char const*, int*) () from /mnt/data/root_6.04_06/root-6.04.06//lib/libRint.so
#30 0x00007ffff75d8321 in TRint::HandleTermInput() () from /mnt/data/root_6.04_06/root-6.04.06//lib/libRint.so
#31 0x00007ffff7a823a5 in TUnixSystem::CheckDescriptors() () from /mnt/data/root_6.04_06/root-6.04.06//lib/libCore.so
#32 0x00007ffff7a8333a in TUnixSystem::DispatchOneEvent(bool) () from /mnt/data/root_6.04_06/root-6.04.06//lib/libCore.so
#33 0x00007ffff7a03a76 in TSystem::InnerLoop() () from /mnt/data/root_6.04_06/root-6.04.06//lib/libCore.so
#34 0x00007ffff7a04680 in TSystem::Run() () from /mnt/data/root_6.04_06/root-6.04.06//lib/libCore.so
#35 0x00007ffff79a473f in TApplication::Run(bool) () from /mnt/data/root_6.04_06/root-6.04.06//lib/libCore.so
#36 0x00007ffff75d9888 in TRint::Run(bool) () from /mnt/data/root_6.04_06/root-6.04.06//lib/libRint.so
#37 0x000000000040103c in main ()

  1. Do NOT execute “gROOT->Reset();” inside of a named macro.
  2. Change:
    for(Float_t i = 0; i < normFlag; i++)
    into:
    for(Int_t i = 0; i < normFlag; i++)

A “Float_t” is a 32 bit floating point value, which gives 6 to 9 decimal digits precision only, so your loop will work up to normFlag = 1.67772e+07.
You could try a “Double_t”, a 64 bit floating point value, which would give you at least 15 decimal digits precision but not more than 17.
An “Int_t” is a 32 bit signed integer, so it should work up to 2147483647.
Or, try a “Long64_t” which is a 64 bit signed integer, so it should work up to 9223372036854775807.

Thanks,
unfortunately (as I was expecting) it doesn’t help.
But nevertheless, thanks a lot! I will keep in mind in the future!
Francesco

#include “TROOT.h”
#include
#include
#include “TH1F.h”
#include
void TestMacroDebug(Int_t maxVal){
for(Int_t i=0;i<maxVal;i++)
{
vector<Int_t> cat;
for(Int_t ii=0;ii<10;ii++)
cat.push_back(ii);
if(i==1e7)
{
cout<<i<<endl;
for(Int_t ii=0;ii<10;ii++)
cout<<cat[ii]<<endl;
}
cat.erase(cat.begin(),cat.begin()+10);
}
}

.L TestMacroDebug.cc+
root [1] TestMacroDebug(1e9)
10000000
0
1
2
3
4
5
6
7
8
9
^C
Program received signal SIGINT, Interrupt.
__lll_lock_wait_private () at …/nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
95 …/nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: No such file or directory.
(gdb) Quit
(gdb) Quit
(gdb) Quit
(gdb) backtrace
#0 __lll_lock_wait_private () at …/nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
#1 0x00007ffff6b4fdca in _L_lock_12779 () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007ffff6b4d7a5 in __GI___libc_malloc (bytes=537) at malloc.c:2887
#3 0x00007ffff7322dad in operator new(unsigned long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4 0x00007ffff737e209 in std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator const&) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5 0x00007ffff737edcb in std::string::_Rep::_M_clone(std::allocator const&, unsigned long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6 0x00007ffff737ee64 in std::string::reserve(unsigned long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#7 0x00007ffff735d1d6 in std::basic_stringbuf<char, std::char_traits, std::allocator >::overflow(int) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#8 0x00007ffff73613f6 in std::basic_streambuf<char, std::char_traits >::xsputn(char const*, long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#9 0x00007ffff734d41d in std::ostreambuf_iterator<char, std::char_traits > std::num_put<char, std::ostreambuf_iterator<char, std::char_traits > >::_M_insert_int(std::ostreambuf_iterator<char, std::char_traits >, std::ios_base&, char, unsigned long) const () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x00007ffff734d60d in std::num_put<char, std::ostreambuf_iterator<char, std::char_traits > >::do_put(std::ostreambuf_iterator<char, std::char_traits >,
std::ios_base&, char, unsigned long) const () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#11 0x00007ffff73592ce in std::ostream& std::ostream::_M_insert(unsigned long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#12 0x00007ffff7ab3cc8 in textinput::TerminalDisplayUnix::HandleResizeSignal() () from /mnt/data/root_6.04_06/root-6.04.06//lib/libCore.so
#13
#14 0x00007ffff6b4aceb in _int_malloc (av=0x7ffff6e89760 <main_arena>, bytes=4) at malloc.c:3351
#15 0x00007ffff6b4d7b0 in __GI___libc_malloc (bytes=4) at malloc.c:2891
#16 0x00007ffff7322dad in operator new(unsigned long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#17 0x00007ffff79a1b44 in void std::vector<int, std::allocator >::_M_emplace_back_aux<int const&>(int const&) () from /mnt/data/root_6.04_06/root-6.04.06//lib/libCore.so
#18 0x00007fffecb183ab in TestMacroDebug(int) () from /mnt/data/corsika/v740/corsika-74000/run/TestMacroDebug_cc.so
#19 0x00007ffff7fc8047 in __cling_Un1Qu30(void*) ()
#20 0x00007ffff362d324 in cling::Interpreter::RunFunction(clang::FunctionDecl const*, cling::Value*) () from /mnt/data/root_6.04_06/root-6.04.06//lib/libCling.so
#21 0x00007ffff3630e4a in cling::Interpreter::EvaluateInternal(std::string const&, cling::CompilationOptions, cling::Value*, cling::Transaction**) ()
from /mnt/data/root_6.04_06/root-6.04.06//lib/libCling.so
#22 0x00007ffff3630fa3 in cling::Interpreter::process(std::string const&, cling::Value*, cling::Transaction**) () from /mnt/data/root_6.04_06/root-6.04.06//lib/libCling.so
#23 0x00007ffff366da73 in cling::MetaProcessor::process(char const*, cling::Interpreter::CompilationResult&, cling::Value*) () from /mnt/data/root_6.04_06/root-6.04.06//lib/libCling.so
#24 0x00007ffff3546406 in TCling::ProcessLine(char const*, TInterpreter::EErrorCode*) () from /mnt/data/root_6.04_06/root-6.04.06//lib/libCling.so
#25 0x00007ffff79a62e0 in TApplication::ProcessLine(char const*, bool, int*) () from /mnt/data/root_6.04_06/root-6.04.06//lib/libCore.so
#26 0x00007ffff75d800f in TRint::ProcessLineNr(char const*, char const*, int*) () from /mnt/data/root_6.04_06/root-6.04.06//lib/libRint.so
#27 0x00007ffff75d8321 in TRint::HandleTermInput() () from /mnt/data/root_6.04_06/root-6.04.06//lib/libRint.so
#28 0x00007ffff7a823a5 in TUnixSystem::CheckDescriptors() () from /mnt/data/root_6.04_06/root-6.04.06//lib/libCore.so
#29 0x00007ffff7a8333a in TUnixSystem::DispatchOneEvent(bool) () from /mnt/data/root_6.04_06/root-6.04.06//lib/libCore.so
#30 0x00007ffff7a03a76 in TSystem::InnerLoop() () from /mnt/data/root_6.04_06/root-6.04.06//lib/libCore.so
#31 0x00007ffff7a04680 in TSystem::Run() () from /mnt/data/root_6.04_06/root-6.04.06//lib/libCore.so
#32 0x00007ffff79a473f in TApplication::Run(bool) () from /mnt/data/root_6.04_06/root-6.04.06//lib/libCore.so
#33 0x00007ffff75d9888 in TRint::Run(bool) () from /mnt/data/root_6.04_06/root-6.04.06//lib/libRint.so
#34 0x000000000040103c in main ()

It seems to me that you pressed “Ctrl-C”.
If you don’t do it, and let you marco run long enough, it will end fine.

Dear Pepe,
thanks a lot.
Maybe I wasn’t clear:
What I did is:

  1. run root under debugger gdb
  2. compile and run the macro
    3)as many other times I saw that the program is frozen (the process doesn’t exist in top command anymore). In normal condition I cannot even stop the process by ctrl +C I have to close the terminal
  3. under debugger I press ctrl +C (here it works)
  4. I type backtrace
    If you need other infos I can provide you.
    Thanks again!
    Francesco

So, try to run it without debugger.

As I told, it gets frozen…
Not even ctrl +C allows me to go back to command line now…
I have to close the terminal

$ root -l
root [0] .L TestMacroDebug.cc+
root [1] TestMacroDebug(1e9)
10000000
0
1
2
3
4
5
6
7
8
9
^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C

[...] $ date -u; root -b -l -n -q 'TestMacroDebug.cc++(1e9)'; date -u Fri Dec 11 12:04:20 UTC 2015 root [0] Processing TestMacroDebug.cc++(1e9)... Info in <TUnixSystem::ACLiC>: creating shared library /..././TestMacroDebug_cc.so 10000000 0 1 2 3 4 5 6 7 8 9 Fri Dec 11 12:38:27 UTC 2015 [...] $

Right!
as it should be. In fact it’s strange it is freezing on my computer.
(actually it is freezing randomly, it could happen maybe in 50% of cases)

I was hoping that somebody gives me some hint to solve this issue…
As I told, me and some other colleague is observing this behaviour. Since we are using ubuntu I suspect there could be some problem of compatibility or some bug inside root. I saw this also on another ubuntu machine.
I have no idea how to proceed…
Thanks a lot!

Hi Francesco,

I think pepe correctly explained the faulty behaviour.
That loop is never finishing if you use a float as the numbers are “too far” to be affected by the increment. No bug in ROOT I fear but rather the expected behaviour of IEEE standard representation of single precision floating point numbers.

Danilo

Dear Danilo,
not really… At 11:51 I posted a macro without any float, just int. Or did I misunderstand your post? :frowning:

This is the version which is freezing. I repeat again, it is something which looks quite independent of the macro which I use.
Thanks
Francesco

Fri Dec 11, 2015 11:45 not 11:51 :slight_smile:

Hi Francesco,

Your script works fine for me to completion. The scripts itself does not contain anything specific to ROOT. You may want to try running it as a standalone executable and see if the problem goes away.

code backtrace
#0 __lll_lock_wait_private () at …/nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
#1 0x00007ffff6b4fdca in _L_lock_12779 () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007ffff6b4d7a5 in __GI___libc_malloc (bytes=537) at malloc.c:2887[/code]makes me wonder if you are starting any threads before executing your example?

Cheers,
Philippe.

Dear Philippe,
sorry, I didn’t see your reply on the second page
Actually I just run the macro, nothing else. No thread.
I already tried to translate and compile small macros in c++ and executing them without root. What usually happens is that the program ends without troubles while under root it was having problems.
Thanks a lot!
Francesco

Try to run the last version of your macro (i.e. with “Int_t”) using exactly this (note: expect something like 30 minutes to 2 hours): root -b -l -n -q 'TestMacroDebug.cc++(1e9)'

Dear Pepe,
same story, it seems it freezes again. With and without debugger.
What does it mean?
Thanks a lot!

root -b -l -n -q 'TestMacroDebug.cc++(1e9)'
root [0]
Processing TestMacroDebug.cc++(1e9)…
Info in TUnixSystem::ACLiC: creating shared library TestMacroDebug_cc.so
10000000
0
1
2
3
4
5
6
7
8
9
^C^C^C^C^C^C^C^C^C^C^C^C^C

Dear Philippe,
actually, I don’t know if this helps…
It is referring to some thread library. This is however just because of the debugger.
After that I run the macro.
However this freeze occurs also without the debugger.
I hope this helps.
Thanks a lot!

(gdb) run
Starting program: root.exe
Traceback (most recent call last):
File “/usr/share/gdb/auto-load/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.19-gdb.py”, line 63, in
from libstdcxx.v6.printers import register_libstdcxx_printers
ImportError: No module named ‘libstdcxx’
[Thread debugging using libthread_db enabled]
Using host libthread_db library “/lib/x86_64-linux-gnu/libthread_db.so.1”.

| Welcome to ROOT 6.04/06 root.cern.ch |
| © 1995-2014, The ROOT Team |
| Built for linuxx8664gcc |
| From heads/v6-04-00-patches@v6-04-04-12-g9436735, Oct 13 2015, 12:34:29 |

Try ‘.help’, ‘.demo’, ‘.license’, ‘.credits’, ‘.quit’/’.q’

root [0] I execute the macro here…
and it gets frozen…
As without the gdb…

Just to make sure. If the time difference between starting the macro and the time you see “10000000” printed is “n” seconds, then you should expect that the macro ends after some “100 * n” seconds. Did you wait long enough?

Another question is … where does your ROOT come from? Did you build it yourself? If yes, how? Looking at your output, I can see “/mnt/data/root_6.04_06/root-6.04.06/” and the “root-6.04.06” suggests to me that you built it “in place”. However, in a [url=https://root-forum.cern.ch/t/root-v6-06-00-on-ubuntu-14-04/20334/1 post[/url], some ROOT developer admitted that this creates problems.

BTW. Do you run ROOT on your (local) desktop machine or do you connect to somewhere using ssh? If it’s ssh then it’s possible that the problem is the “idle” timeout, so that after a certain time of your “inactivity”, the session gets automatically killed (actually, it is also possible that your router / firewall has a built-in TCP “idle” timeout). In this case try “ssh -o ServerAliveInterval=99 …”.

Dear Pepe,
yes it looks like that IF the macro can reach the end it takes 100 times the time to show 10000000. Otherwise it gets frozen and never finishes. If successful, It does not take hours but maybe 2-3 seconds to print the output and about 5 minutes to finish.

I just downloaded the root root_v6.04.06.source.tar.gz from the root page. Unpacked and compiled as explained in the manual. ./configure , make.
I will check the other post.
However, I had this problem also for root 5!

I both run with ssh or on local machine. This freeze problem happens in both cases.

Thanks a lot!
Francesco

You can try three things …

  1. If you use a “Ubuntu 14.04 / x86_64” then try the ROOT “binary distribution”: Release 6.04/06 - 2015-10-13 … simply download the root_v6.04.06.Linux-ubuntu14-x86_64-gcc4.8.tar.gz file and unpack it … then execute “source /Where/It/Is/Unpacked/bin/thisroot.sh” and in the same terminal window run your macro (I tested this “binary distribution” on my machine and it worked for your macro).

  2. Try to build the newest ROOT 6.06.00 using my instructions here: Root-6.06.00 on Ubuntu 14.04 32 bit

  3. This is a bit stupid idea, but check your “limits” … it is possible that you exceed something:
    [bash]$ ulimit -S -a
    [bash]$ ulimit -H -a
    [tcsh]$ limit
    [tcsh]$ limit -h
    And check your “disk quota” if you have any (and the overall “free disk space” using “df -h”).