Randomly occurring memory error

Dear ROOTers,

I am running a relatively standard program which contains the following line:
TFile *f = new TFile("TRIAL_stgcGnam_TRIAL.root", "RECREATE");.
Sometimes, the code runs fine and without errors. However, quite frequently and without having made any changes to the code, this line causes a memory error, i.e.

*** Error in `./stgcGnam': free(): invalid next size (fast): 0x0000000000f51510 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x81329)[0x7efdab388329]
/usr/lib64/root/libCore.so.6.24(_ZN10TClassEdit17GetNormalizedNameERSsNSt12experimental6__ROOT17basic_string_viewIcSt11char_traitsIcEEE+0x521)[0x7efdae466a81]
/usr/lib64/root/libCore.so.6.24(_ZN11TClassTable11FindElementEPKcb+0x60)[0x7efdae430a20]
/usr/lib64/root/libCore.so.6.24(_ZN4ROOT17ResetClassVersionEP6TClassPKcs+0xaa)[0x7efdae430caa]
/usr/lib64/root/libCore.so.6.24(_ZN4ROOT17TGenericClassInfo10SetVersionEs+0x20)[0x7efdae495250]
/lib64/ld-linux-x86-64.so.2(+0xf9c3)[0x7efdae8679c3]
/lib64/ld-linux-x86-64.so.2(+0x1459e)[0x7efdae86c59e]
/lib64/ld-linux-x86-64.so.2(+0xf7d4)[0x7efdae8677d4]
/lib64/ld-linux-x86-64.so.2(+0x13b8b)[0x7efdae86bb8b]
/lib64/libdl.so.2(+0xfab)[0x7efdac194fab]
/lib64/ld-linux-x86-64.so.2(+0xf7d4)[0x7efdae8677d4]
/lib64/libdl.so.2(+0x15ad)[0x7efdac1955ad]
/lib64/libdl.so.2(dlopen+0x31)[0x7efdac195041]
/usr/lib64/root/libCore.so.6.24(_ZN5TROOT15InitInterpreterEv+0x2b0)[0x7efdae395490]
/usr/lib64/root/libCore.so.6.24(_ZN4ROOT8Internal8GetROOT2Ev+0x36)[0x7efdae395716]
/usr/lib64/root/libCore.so.6.24(_ZNK4TEnv8GetvalueEPKc+0x1d6)[0x7efdae3c0446]
/usr/lib64/root/libCore.so.6.24(_ZNK4TEnv8GetValueEPKcS1_+0x9)[0x7efdae3c0b19]
/usr/lib64/root/libCore.so.6.24(_ZN4TUrl19GetSpecialProtocolsEv+0x136)[0x7efdae419146]
/usr/lib64/root/libCore.so.6.24(_ZN4TUrl6SetUrlEPKcb+0x21e)[0x7efdae41973e]
/usr/lib64/root/libCore.so.6.24(_ZN4TUrlC2EPKcb+0x129)[0x7efdae419dc9]
/usr/lib64/root/libRIO.so.6.24(_ZN5TFileC1EPKcS1_S1_i+0x1d5)[0x7efdaddb7d65]
./stgcGnam(main+0x196)[0x41e348]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7efdab329555]
./stgcGnam[0x41d2c9]
======= Memory map: ========
00400000-00451000 r-xp 00000000 00:2b 168898078                          /afs/cern.ch/user/r/rbrener/public/sTGC_Online_Monitoring/Gnam_9_forRoy/sw/gnam/NSWRead/bin/stgcGnam
00650000-00651000 r--p 00050000 00:2b 168898078                          /afs/cern.ch/user/r/rbrener/public/sTGC_Online_Monitoring/Gnam_9_forRoy/sw/gnam/NSWRead/bin/stgcGnam
00651000-00652000 rw-p 00051000 00:2b 168898078                          /afs/cern.ch/user/r/rbrener/public/sTGC_Online_Monitoring/Gnam_9_forRoy/sw/gnam/NSWRead/bin/stgcGnam
00652000-0065a000 rw-p 00000000 00:00 0 
00e9a000-00f72000 rw-p 00000000 00:00 0                                  [heap]
7efda0000000-7efda0021000 rw-p 00000000 00:00 0 
7efda0021000-7efda4000000 ---p 00000000 00:00 0 
7efda5117000-7efda55f1000 r-xp 00000000 00:4f 15081153                   /cvmfs/sft.cern.ch/lcg/releases/ROOT/v6.22.00-be0a0/x86_64-centos7-gcc8-opt/lib/libCore.so
7efda55f1000-7efda57f0000 ---p 004da000 00:4f 15081153                   /cvmfs/sft.cern.ch/lcg/releases/ROOT/v6.22.00-be0a0/x86_64-centos7-gcc8-opt/lib/libCore.so
7efda57f0000-7efda581a000 r--p 004d9000 00:4f 15081153                   /cvmfs/sft.cern.ch/lcg/releases/ROOT/v6.22.00-be0a0/x86_64-centos7-gcc8-opt/lib/libCore.so
7efda581a000-7efda5820000 rw-p 00503000 00:4f 15081153                   /cvmfs/sft.cern.ch/lcg/releases/ROOT/v6.22.00-be0a0/x86_64-centos7-gcc8-opt/lib/libCore.so
7efda5820000-7efda5854000 rw-p 00000000 00:00 0 
7efda5854000-7efda58a3000 r-xp 00000000 00:4f 15080798                   /cvmfs/sft.cern.ch/lcg/releases/ROOT/v6.22.00-be0a0/x86_64-centos7-gcc8-opt/lib/libThread.so
7efda58a3000-7efda5aa2000 ---p 0004f000 00:4f 15080798                   /cvmfs/sft.cern.ch/lcg/releases/ROOT/v6.22.00-be0a0/x86_64-centos7-gcc8-opt/lib/libThread.so
7efda5aa2000-7efda5aa6000 r--p 0004e000 00:4f 15080798                   /cvmfs/sft.cern.ch/lcg/releases/ROOT/v6.22.00-be0a0/x86_64-centos7-gcc8-opt/lib/libThread.so
7efda5aa6000-7efda5aa7000 rw-p 00052000 00:4f 15080798                   /cvmfs/sft.cern.ch/lcg/releases/ROOT/v6.22.00-be0a0/x86_64-centos7-gcc8-opt/lib/libThread.so
7efda5aa7000-7efda5aa9000 rw-p 00000000 00:00 0 
7efda5aa9000-7efda5e4c000 r-xp 00000000 00:4f 15081276                   /cvmfs/sft.cern.ch/lcg/releases/ROOT/v6.22.00-be0a0/x86_64-centos7-gcc8-opt/lib/libRIO.so
7efda5e4c000-7efda604b000 ---p 003a3000 00:4f 15081276                   /cvmfs/sft.cern.ch/lcg/releases/ROOT/v6.22.00-be0a0/x86_64-centos7-gcc8-opt/lib/libRIO.so
7efda604b000-7efda6059000 r--p 003a2000 00:4f 15081276                   /cvmfs/sft.cern.ch/lcg/releases/ROOT/v6.22.00-be0a0/x86_64-centos7-gcc8-opt/lib/libRIO.so
7efda6059000-7efda605c000 rw-p 003b0000 00:4f 15081276                   /cvmfs/sft.cern.ch/lcg/releases/ROOT/v6.22.00-be0a0/x86_64-centos7-gcc8-opt/lib/libRIO.so
7efda605c000-7efda7466000 rw-p 00000000 00:00 0 
7efda7466000-7efda7cd1000 r--s 00000000 fc:01 167923174                  /var/lib/sss/mc/passwd
7efda7cd1000-7efda7cd9000 r-xp 00000000 fc:01 6719867                    /usr/lib64/libnss_sss.so.2
7efda7cd9000-7efda7ed8000 ---p 00008000 fc:01 6719867                    /usr/lib64/libnss_sss.so.2
7efda7ed8000-7efda7ed9000 r--p 00007000 fc:01 6719867                    /usr/lib64/libnss_sss.so.2
7efda7ed9000-7efda7eda000 rw-p 00008000 fc:01 6719867                    /usr/lib64/libnss_sss.so.2
7efda7eda000-7efda7ee6000 r-xp 00000000 fc:01 6376904                    /usr/lib64/libnss_files-2.17.so
7efda7ee6000-7efda80e5000 ---p 0000c000 fc:01 6376904                    /usr/lib64/libnss_files-2.17.so
7efda80e5000-7efda80e6000 r--p 0000b000 fc:01 6376904                    /usr/lib64/libnss_files-2.17.so
7efda80e6000-7efda80e7000 rw-p 0000c000 fc:01 6376904                    /usr/lib64/libnss_files-2.17.so
7efda80e7000-7efda80ed000 rw-p 00000000 00:00 0 
7efda80ed000-7efda8111000 r-xp 00000000 fc:01 6377026                    /usr/lib64/libselinux.so.1
7efda8111000-7efda8310000 ---p 00024000 fc:01 6377026                    /usr/lib64/libselinux.so.1
7efda8310000-7efda8311000 r--p 00023000 fc:01 6377026                    /usr/lib64/libselinux.so.1
7efda8311000-7efda8312000 rw-p 00024000 fc:01 6377026                    /usr/lib64/libselinux.so.1
7efda8312000-7efda8314000 rw-p 00000000 00:00 0 
7efda8314000-7efda832a000 r-xp 00000000 fc:01 6376914                    /usr/lib64/libresolv-2.17.so
7efda832a000-7efda852a000 ---p 00016000 fc:01 6376914                    /usr/lib64/libresolv-2.17.soAborted (core dumped)

Different people have suggested this may be a ROOT version issue but upon trying to change the ROOT version things haven’t changed (the code occasionally ran and occasionally failed). Other people suggested this might be a compiler issue (FYI I’m running on lxplus).

Debugging the code shows that the line above, where the new ROOT file is declared, instigates the memory crashes.

Could anyone suggest a solution which will fix this for good? I can share my make file if it were needed.

Many thanks in advance!


_ROOT Version:_6-24-06
_Platform:_Linux
Compiler: gcc version 8.3.0


Hello,

Perhaps @Axel might know what can be going on.

Can you try, from lxplus, using the 6.24.06 release available on CVMFS:

source /cvmfs/sft.cern.ch/lcg/app/releases/ROOT/6.24.06/x86_64-centos7-gcc48-opt/bin/thisroot.sh

When you run your code after sourcing the CVMFS root release, do you see the same error?

This is indeed likely a memory error, and the location of the complaint is generally not the location of the bug. Can you run with valgrind and upload the output as an attachment, please?

Hi Axel and etejedor thanks for your prompt replies!
etejedor — I’ve tried that before my original post and it hasn’t had any effect.
Axel — my valgrind file is too big to be attached. May I provide you with its public path on lxplus? /afs/cern.ch/user/r/rbrener/public/sTGC_Online_Monitoring/Gnam_9_forRoy/sw/gnam/NSWRead/bin/valgrind-out.txt
Also, please note that this time the code worked and ran well, so this might have a misleading effect on the output resulting in the valgrind-out.txt file.
Thanks again.

Hi @Axel, any thoughts?
Thanks,
Roy

No thoughts - valgrind does not report anything that would indicate such an error. That’s extremely surprising; I have never seen heap errors undetected by valgrind. The best we can do is you re-running this with valgrind until it fails: there must be something else than a memory error triggering it.

Another option is to reduce the code that is causing this. Does a simple TFile *f = new TFile("TRIAL_stgcGnam_TRIAL.root", "RECREATE");. also trigger it, sometimes? If not, how much of your code do you need to add to make it re-appear? Can you share this minimal reproducing code, maybe we can see something that might cause it?

Hi @Axel, thanks for your reply.

  • Now it’s failing again and correspondingly I’ve recreated a valgrind-out.txt file (same path). Perhaps it’s more helpful to you now.
  • TFile *f = new TFile("TRIAL_stgcGnam_TRIAL.root", "RECREATE"); doesn’t produce an error in a standalone ROOT interpreter.
  • I’ve checked again and indeed that line of the code is the one causing the crash. The degbug file, /afs/cern.ch/user/r/rbrener/public/sTGC_Online_Monitoring/Gnam_9_forRoy/sw/gnam/NSWRead/bin/out.txt, has debug message Fine 2 printed. In the source code used for the binary executable, ./stgcGnam, /afs/cern.ch/user/r/rbrener/public/sTGC_Online_Monitoring/Gnam_9_forRoy/sw/gnam/NSWRead/src/apps/stgcGnam.cpp, you can see the debug messages Fine 2, Fine 3, clearly surrounding TFile *f = new TFile("TRIAL_stgcGnam_TRIAL.root", "RECREATE");

Hope this sheds some light on the problem.

Thanks again!

Thanks!

Quoting the relevant parts from your output:

==20089==    at 0x50BA8C7: TIsAProxy::TIsAProxy(std::type_info const&) (in /usr/lib64/root/libCore.so.6.24.06)
==20089==    by 0xE53E9C3: ROOT::GenerateInitInstanceLocal(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const*) [clone .isra.155] (in /cvmfs/sft.cern.ch/
lcg/releases/ROOT/v6.22.00-be0a0/x86_64-centos7-gcc8-opt/lib/libCore.so)

You can see that two ROOT versions are interfering here. Please make sure you have your $LD_LIBRARY_PATH, $PATH, and $ROOTSYS use one consistent ROOT version, not two!

Thanks @Axel! This might indeed be it. I’m taking it with a pinch of salt as many times in the past I’ve thought this to have been dealt with but it reoccurred over and over again. I’ll report should anything arise again. For now, many many thanks! I deeply appreciate your help.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.