Hello!
I have a large and complicated piece of code written in c++ that I compile with gcc against the ROOT libraries. It generates a large number [~O(1000)] of histograms, draws and saves them as .pdf files. Additionally, in the end, a .root file is generated with all the original histograms.
This worked fine until recently. However, for reasons beyond my understanding, I started to recieve a std::bac_alloc exception at the end of my code, which I have been able to trace down as follows:
Please note that the object “samples” is of a custom class derived from TFolder that does not overwrite the “Write” method. The instance at hand contains a hierarchy of other objects, most of which derive from TFolder and TH1*. Some of them are created at runtime, others have been loaded from a different root file.
The function “error” is merely a precompiler macro for "std::cout << "HWWAnalsisCode 2012: " << ARG << std::endl;
TFile * file = TFile::Open(postReadFilename, "RECREATE");
if(!file || !file->IsOpen()){
warn("unable to open output file '"+postReadFilename+"'!");
} else {
try {
samples->Write();
file->Close();
if(file) delete file;
} catch (std::bad_alloc& ba){
error("there was an error allocating the memory required to write a post-read file, skipping");
}
}
Naturally, I was first expecting that the machine runs out of memory. However, when using “top” to trace the machines memory consumption in parallel, one can see that the code consumes ~14% of the machines memory at the time of the crash. Also, to my understanding, there is no reason to assume that the memory consumption dramatically increases by more than a factor of two when calling TFolder::Write or TFile::Close.
Additionally, even the above code that catches the exception causes the following segmentation violation. Please note the error message directly prior to the SegFault, which stems from the “error” call in the above code.
HWWAnalysisCode 2012: ERROR: there was an error allocating the memory required to write a post-read file, skipping
*** Break *** segmentation violation
===========================================================
There was a crash.
This is the entire stack trace of all threads:
===========================================================
#0 0x00007f00b349f8be in waitpid () from /lib64/libc.so.6
#1 0x00007f00b3431909 in do_system () from /lib64/libc.so.6
#2 0x00007f00ba085a4c in TUnixSystem::StackTrace() () from /cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/x86_64/root/5.34.18-x86_64-slc6-gcc4.7/lib/libCore.so
#3 0x00007f00ba088283 in TUnixSystem::DispatchSignals(ESignals) () from /cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/x86_64/root/5.34.18-x86_64-slc6-gcc4.7/lib/libCore.so
#4 <signal handler called>
#5 0x00007f00b37830b8 in main_arena () from /lib64/libc.so.6
#6 0x00007f00b8de6029 in TDirectoryFile::WriteKeys() () from /cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/x86_64/root/5.34.18-x86_64-slc6-gcc4.7/lib/libRIO.so
#7 0x00007f00b8de7e79 in TDirectoryFile::SaveSelf(bool) () from /cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/x86_64/root/5.34.18-x86_64-slc6-gcc4.7/lib/libRIO.so
#8 0x00007f00b8de9049 in TDirectoryFile::Save() () from /cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/x86_64/root/5.34.18-x86_64-slc6-gcc4.7/lib/libRIO.so
#9 0x00007f00b8de7cf9 in TDirectoryFile::Close(char const*) () from /cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/x86_64/root/5.34.18-x86_64-slc6-gcc4.7/lib/libRIO.so
#10 0x00007f00b8df8071 in TFile::Close(char const*) () from /cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/x86_64/root/5.34.18-x86_64-slc6-gcc4.7/lib/libRIO.so
#11 0x00007f00b9ff3070 in (anonymous namespace)::R__ListSlowClose(TList*) () from /cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/x86_64/root/5.34.18-x86_64-slc6-gcc4.7/lib/libCore.so
#12 0x00007f00b9ff3f07 in TROOT::CloseFiles() () from /cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/x86_64/root/5.34.18-x86_64-slc6-gcc4.7/lib/libCore.so
#13 0x00007f00b9ff4339 in TROOT::EndOfProcessCleanups() () from /cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/x86_64/root/5.34.18-x86_64-slc6-gcc4.7/lib/libCore.so
#14 0x00007f00b3428e22 in exit () from /lib64/libc.so.6
#15 0x00007f00b3411d24 in __libc_start_main () from /lib64/libc.so.6
#16 0x00000000004165d1 in _start ()
===========================================================
The lines below might hint at the cause of the crash.
If they do not help you then please submit a bug report at
http://root.cern.ch/bugs. Please post the ENTIRE stack trace
from above as an attachment in addition to anything else
that might help us fixing this issue.
===========================================================
#5 0x00007f00b37830b8 in main_arena () from /lib64/libc.so.6
#6 0x00007f00b8de6029 in TDirectoryFile::WriteKeys() () from /cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/x86_64/root/5.34.18-x86_64-slc6-gcc4.7/lib/libRIO.so
#7 0x00007f00b8de7e79 in TDirectoryFile::SaveSelf(bool) () from /cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/x86_64/root/5.34.18-x86_64-slc6-gcc4.7/lib/libRIO.so
#8 0x00007f00b8de9049 in TDirectoryFile::Save() () from /cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/x86_64/root/5.34.18-x86_64-slc6-gcc4.7/lib/libRIO.so
#9 0x00007f00b8de7cf9 in TDirectoryFile::Close(char const*) () from /cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/x86_64/root/5.34.18-x86_64-slc6-gcc4.7/lib/libRIO.so
#10 0x00007f00b8df8071 in TFile::Close(char const*) () from /cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/x86_64/root/5.34.18-x86_64-slc6-gcc4.7/lib/libRIO.so
#11 0x00007f00b9ff3070 in (anonymous namespace)::R__ListSlowClose(TList*) () from /cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/x86_64/root/5.34.18-x86_64-slc6-gcc4.7/lib/libCore.so
#12 0x00007f00b9ff3f07 in TROOT::CloseFiles() () from /cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/x86_64/root/5.34.18-x86_64-slc6-gcc4.7/lib/libCore.so
#13 0x00007f00b9ff4339 in TROOT::EndOfProcessCleanups() () from /cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/x86_64/root/5.34.18-x86_64-slc6-gcc4.7/lib/libCore.so
#14 0x00007f00b3428e22 in exit () from /lib64/libc.so.6
#15 0x00007f00b3411d24 in __libc_start_main () from /lib64/libc.so.6
#16 0x00000000004165d1 in _start ()
===========================================================
This might be due to some stupid mistake on my side, but I don’t know enough of the internals of TFolder::Write/TFile::Close to know as where to look for the problem. Can any of the following cause this kind of behaviour?
- circular references in the TFolder hierarchy
- messing around with gDirectory
- not properly opening or closing other root files
I would be happy for any ideas as where to look for the cause of this annoying problem. And even assuming that I cannot fix the problem that causes TFolder::Write/TFile::Close to fail in this case - is there a way to have the code exit gracefully instead of encountering a Segmentation fault?
Regards,
Carsten Burgard