Hadd Seg Fault

Hi, I’m trying to merge some .root files, but hadd is giving me a seg fault. I tried with ROOT 5.34 and 6.01, both on OSX 10.9 compiled with clang++ from XCode, all dependencies except GSL are provided by MacPorts.

The error I get with the seg fault is:

hadd -f0 -O run00500_h.root run00094_h.root run00121_h.root run00134_h.root run00153_h.root run00160_h.root run00161_h.root run00163_h.root run00164_h.root
hadd Target file: run00500_h.root
hadd Source file 1: run00094_h.root
hadd Source file 2: run00121_h.root
hadd Source file 3: run00134_h.root
hadd Source file 4: run00153_h.root
hadd Source file 5: run00160_h.root
hadd Source file 6: run00161_h.root
hadd Source file 7: run00163_h.root
hadd Source file 8: run00164_h.root
hadd Target path: run00500_h.root:/
hadd(43978,0x7fff7cf9b310) malloc: *** error for object 0x7f8a0e324f38: incorrect checksum for freed object - object was probably modified after being freed.
*** set a breakpoint in malloc_error_break to debug
Abort trap: 6

I run this in my debugger (lldb), and I get this additional information:

hadd(44088,0x7fff7cf9b310) malloc: *** error for object 0x101760420: incorrect checksum for freed object - object was probably modified after being freed.
*** set a breakpoint in malloc_error_break to debug
Process 44088 stopped
* thread #1: tid = 0xa1df04, 0x00007fff9024a866 libsystem_kernel.dylib`__pthread_kill + 10, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
    frame #0: 0x00007fff9024a866 libsystem_kernel.dylib`__pthread_kill + 10
libsystem_kernel.dylib`__pthread_kill + 10:
-> 0x7fff9024a866:  jae    0x7fff9024a870            ; __pthread_kill + 20
   0x7fff9024a868:  movq   %rax, %rdi
   0x7fff9024a86b:  jmpq   0x7fff90247175            ; cerror_nocancel
   0x7fff9024a870:  ret    
(lldb) bt
* thread #1: tid = 0xa1df04, 0x00007fff9024a866 libsystem_kernel.dylib`__pthread_kill + 10, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
  * frame #0: 0x00007fff9024a866 libsystem_kernel.dylib`__pthread_kill + 10
    frame #1: 0x00007fff9515735c libsystem_pthread.dylib`pthread_kill + 92
    frame #2: 0x00007fff92c94b1a libsystem_c.dylib`abort + 125
    frame #3: 0x00007fff94223690 libsystem_malloc.dylib`szone_error + 587
    frame #4: 0x00007fff94221595 libsystem_malloc.dylib`szone_free_definite_size + 3011
    frame #5: 0x00000001008429f2 libTree.so`TLeafI::~TLeafI() + 50
    frame #6: 0x0000000100dbfc88 libCore.so`TObjArray::Delete(char const*) + 136
    frame #7: 0x00000001008057a9 libTree.so`TBranch::~TBranch() + 329
    frame #8: 0x000000010080560e libTree.so`TBranch::~TBranch() + 14
    frame #9: 0x0000000100dbfc88 libCore.so`TObjArray::Delete(char const*) + 136
    frame #10: 0x000000010084dd81 libTree.so`TTree::~TTree() + 385
    frame #11: 0x000000010084db4e libTree.so`TTree::~TTree() + 14
    frame #12: 0x000000010087fb7e libTree.so`ROOT::delete_TTree(void*) + 46
    frame #13: 0x0000000100dd67f6 libCore.so`TClass::Destructor(void*, bool) + 70
    frame #14: 0x000000010003600c libRIO.so`TFileMerger::MergeRecursive(TDirectory*, TList*, int) + 7276
    frame #15: 0x0000000100036495 libRIO.so`TFileMerger::PartialMerge(int) + 533
    frame #16: 0x00000001000026ac hadd`main + 5308

Unfortunately hadd isn’t compiled with -g it seems, so the specific lines of code can’t be seen. If anyone can suggest how to debug further, let me know. Is it possible to re-compiled hadd only, leaving the rest of ROOT as-is? Then I could make with -g.

Incidentally, the produced file is 16GB before the seg fault, with 593253 entries in the TTree. The input file’s TTrees had a combined number of entries of 593253, so it seems the full TTrees are being combined, but maybe the closing of the file is a problem? Or maybe it’s the other objects in the TFiles that are problematic. Can I tell hadd just to merge the TTrees and ignore other objects? The other objects are TObjArrays and TH1Fs.

Jean-François

Hi Jean-Francois,

The problem seems to be indeed in the closing phase and is related to the TTree. It appears that the objects are being deleted twice. One way to debug this further is to run valgrind on the failing hadd

Do you see the problem with fewer files? Can you send me a minimal set of files reproducing the problem?

Thanks,
Philippe.

Since I was mostly just interested in merging the single TTrees in each of the files, I ended up writing my own script that used TChain::Merge. Trying hadd took ~15 minutes before crashing, so I didn’t spend much time debugging further.

Regarding valgrind, unfortunately it does not work on OSX 10.9. The crash did not occur with a smaller number of files.

Jean-François