How to detect corrupt Root files?

Some jobs I run occasionally produce corrupted Root files along with valid ones. I’d like to find a way to quickly scan through a set of Root files and find the corrupted files. Unfortunately, merely opening these corrupt files with Root causes a segmentation error. My current method for identifying these corrupt files is to open them with Root and see if the segmentation violation occurs, but this method is somewhat slow. I’d prefer a faster and more elegant method.

I don’t need to recover these files – I just want to identify them quickly and delete them.

Below I give the Root version I’m using followed by an example of the crash. It occurs just by passing the file to Root. I wrote a very simple test program, which I’ve attached, that also shows the same problem.

ROOT 5.22/00d (branches/v5-22-00-patches@29532, May 19 2010, 21:33:00 on linux)

CINT/ROOT C/C++ Interpreter version 5.16.29, Jan 08, 2008

root -l -n nTuple_data_102_1_tvk.root
root [0]
Attaching file nTuple_data_102_1_tvk.root as _file0…
Error in TFile::Init: file nTuple_data_102_1_tvk.root is truncated at 250959128 bytes: should be 309749504, trying to recover
Info in TFile::Recover: nTuple_data_102_1_tvk.root, recovered key TDirectoryFile:configurableAnalysis at address 242

*** Break *** segmentation violation
Attaching to program: /proc/8145/exe, process 8145
[Thread debugging using libthread_db enabled]
0xffffe410 in __kernel_vsyscall ()
#1 0x006b8713 in __waitpid_nocancel () from /lib/libc.so.6
#2 0x0065d07b in do_system () from /lib/libc.so.6
#3 0x008aeead in system () from /lib/libpthread.so.0
#4 0xf7ac9b0d in TUnixSystem::Exec(char const*) () from /sharesoft/cmssw/slc5_ia32_gcc434/lcg/root/5.22.00d-cms18/lib/libCore.so
#5 0xf7acefab in TUnixSystem::StackTrace() () from /sharesoft/cmssw/slc5_ia32_gcc434/lcg/root/5.22.00d-cms18/lib/libCore.so
#6 0xf7acfd7d in TUnixSystem::DispatchSignals(ESignals) () from /sharesoft/cmssw/slc5_ia32_gcc434/lcg/root/5.22.00d-cms18/lib/libCore.so
#7 0xf7acfe7d in SigHandler(ESignals) () from /sharesoft/cmssw/slc5_ia32_gcc434/lcg/root/5.22.00d-cms18/lib/libCore.so
#8 0xf7ac6782 in sighandler(int) () from /sharesoft/cmssw/slc5_ia32_gcc434/lcg/root/5.22.00d-cms18/lib/libCore.so
#9
#10 0xf6b7fb04 in TFile::Recover() () from /sharesoft/cmssw/slc5_ia32_gcc434/lcg/root/5.22.00d-cms18/lib/libRIO.so
#11 0xf6b7d9f4 in TFile::Init(bool) () from /sharesoft/cmssw/slc5_ia32_gcc434/lcg/root/5.22.00d-cms18/lib/libRIO.so
#12 0xf6b7f307 in TFile::TFile(char const*, char const*, char const*, int) ()
from /sharesoft/cmssw/slc5_ia32_gcc434/lcg/root/5.22.00d-cms18/lib/libRIO.so
#13 0xf6b86020 in TFile::Open(char const*, char const*, char const*, int, int) ()
from /sharesoft/cmssw/slc5_ia32_gcc434/lcg/root/5.22.00d-cms18/lib/libRIO.so
#14 0xf6c6d44d in G__G__IO_107_0_103(G__value*, char const*, G__param*, int) ()
from /sharesoft/cmssw/slc5_ia32_gcc434/lcg/root/5.22.00d-cms18/lib/libRIO.so
#15 0xf70432b6 in Cint::G__ExceptionWrapper(int ()(G__value, char const*, G__param*, int), G__value*, char*, G__param*, int) ()
from /sharesoft/cmssw/slc5_ia32_gcc434/lcg/root/5.22.00d-cms18/lib/libCint.so
#16 0xf7103a7c in G__execute_call () from /sharesoft/cmssw/slc5_ia32_gcc434/lcg/root/5.22.00d-cms18/lib/libCint.so
#17 0xf7104d56 in G__call_cppfunc () from /sharesoft/cmssw/slc5_ia32_gcc434/lcg/root/5.22.00d-cms18/lib/libCint.so
#18 0xf70dcf9c in G__interpret_func () from /sharesoft/cmssw/slc5_ia32_gcc434/lcg/root/5.22.00d-cms18/lib/libCint.so
#19 0xf70cc26c in G__getfunction () from /sharesoft/cmssw/slc5_ia32_gcc434/lcg/root/5.22.00d-cms18/lib/libCint.so
#20 0xf709c3f8 in G__getitem () from /sharesoft/cmssw/slc5_ia32_gcc434/lcg/root/5.22.00d-cms18/lib/libCint.so
#21 0xf70a2333 in G__getexpr () from /sharesoft/cmssw/slc5_ia32_gcc434/lcg/root/5.22.00d-cms18/lib/libCint.so
#22 0xf708c535 in G__define_var () from /sharesoft/cmssw/slc5_ia32_gcc434/lcg/root/5.22.00d-cms18/lib/libCint.so
#23 0xf712d37f in G__exec_statement () from /sharesoft/cmssw/slc5_ia32_gcc434/lcg/root/5.22.00d-cms18/lib/libCint.so
#24 0xf708729a in G__exec_tempfile_core () from /sharesoft/cmssw/slc5_ia32_gcc434/lcg/root/5.22.00d-cms18/lib/libCint.so
#25 0xf7087579 in G__exec_tempfile_fp () from /sharesoft/cmssw/slc5_ia32_gcc434/lcg/root/5.22.00d-cms18/lib/libCint.so
#26 0xf713c75f in G__process_cmd () from /sharesoft/cmssw/slc5_ia32_gcc434/lcg/root/5.22.00d-cms18/lib/libCint.so
#27 0xf7ab91a4 in TCint::ProcessLine(char const*, TInterpreter::EErrorCode*) ()
from /sharesoft/cmssw/slc5_ia32_gcc434/lcg/root/5.22.00d-cms18/lib/libCore.so
#28 0xf79de243 in TApplication::ProcessLine(char const*, bool, int*) ()
from /sharesoft/cmssw/slc5_ia32_gcc434/lcg/root/5.22.00d-cms18/lib/libCore.so
#29 0xf6e37541 in TRint::Run(bool) () from /sharesoft/cmssw/slc5_ia32_gcc434/lcg/root/5.22.00d-cms18/lib/libRint.so
#30 0x08048d55 in main ()
tstbad.C (146 Bytes)

see documentation of TFile::Open and TFile::Recover

Rene

After studying the TFile:TFile() documentation, and with some experimentation, I found that adding the following line to my $HOME/.rootrc file disables file recovery and thereby prevents the segmentation fault when opening a corrupt Root file:

TFile.Recover: 0

With this setting, I can run a Root macro that will quickly check a set of Root files and find the corrupt ones without crashing.

Why the file recovery process encounters a segmentation fault when trying to recover a corrupted file is another question. Perhaps it will be fixed in some later version of Root, but, for my case, I don’t care about recovering the corrupted files, so I have the solution for my current problem.