Can't merge large root files

Dear Experts,

I am trying to merge 11 files each with around 500MB using hadd in ROOT version 5.34/08 in OSX 10.9 machine. Those are the results obtained from large scale GRID analysis after merging the analysis outputs within small subgroups.

Following is the file structure of each file:

KEY: TDirectoryFile PIDqa;1 PIDqa KEY: TList RsnOut;1 Doubly linked list

Merging produces a segmentation violation both in hadd command or using a macro. I’ve also tried to merge them two by two but not worked.

Root files and the merging macro can be found in my afs directory: /afs/cern.ch/user/a/akarasu/public/grid_analysis

I tried to change tree size limit in the merge macro with using TTree::SetMaxTreeSize(maxsize), but I still can’t get the output.

Isn’t it possible to get one single root file when the files are large?

Any help would be greatly appreciated.

Thank you,
Kind regards,
Ayben

Follownig is the stack trace:

[code]

Error in TBufferFile::CheckByteCount: object of class TList read too many bytes: 1733452366 instead of 659710542
Warning in TBufferFile::CheckByteCount: TList::Streamer() not in sync with data on file /var/folders/zg/pgvmn0c507b0s4lplm69hywr0000gn/T//ROOTMERGE-87be5044-36ea-11e4-9717-18fe8d80beef.root, fix Streamer()
Error in TBufferFile::CheckByteCount: object of class TList read too many bytes: 1504139038 instead of 430397214
Warning in TBufferFile::CheckByteCount: TList::Streamer() not in sync with data on file /var/folders/zg/pgvmn0c507b0s4lplm69hywr0000gn/T//ROOTMERGE-897721fe-36ea-11e4-9717-18fe8d80beef.root, fix Streamer()

*** Break *** segmentation violation
Generating stack trace…
0x00000001072d8536 in TBufferFile::WriteObjectClass(void const*, TClass const*) (in libRIO.so) + 374
0x00000001072d8655 in TBufferFile::WriteObjectAny(void const*, TClass const*) (in libRIO.so) + 245
0x00000001072d7c72 in TBufferFile::WriteFastArray(void**, TClass const*, int, bool, TMemberStreamer*) (in libRIO.so) + 274
0x000000010745ce07 in int TStreamerInfo::WriteBufferAux<char**>(TBuffer&, char** const&, int, int, int, int) (in libRIO.so) + 17767
0x000000010733b5eb in TStreamerInfoActions::GenericWriteAction(TBuffer&, void*, TStreamerInfoActions::TConfiguration const*) (in libRIO.so) + 59
0x00000001072daa1d in TBufferFile::ApplySequence(TStreamerInfoActions::TActionSequence const&, void*) (in libRIO.so) + 61
0x00000001072da94b in TBufferFile::WriteClassBuffer(TClass const*, void*) (in libRIO.so) + 331
0x00000001072d8536 in TBufferFile::WriteObjectClass(void const*, TClass const*) (in libRIO.so) + 374
0x00000001072d8655 in TBufferFile::WriteObjectAny(void const*, TClass const*) (in libRIO.so) + 245
0x0000000102c26834 in TObjArray::Streamer(TBuffer&) (in libCore.5.so) + 388
0x00000001072d7b37 in TBufferFile::WriteFastArray(void*, TClass const*, int, TMemberStreamer*) (in libRIO.so) + 151
0x000000010745a000 in int TStreamerInfo::WriteBufferAux<char**>(TBuffer&, char** const&, int, int, int, int) (in libRIO.so) + 5984
0x000000010733b5eb in TStreamerInfoActions::GenericWriteAction(TBuffer&, void*, TStreamerInfoActions::TConfiguration const*) (in libRIO.so) + 59
0x00000001072daa1d in TBufferFile::ApplySequence(TStreamerInfoActions::TActionSequence const&, void*) (in libRIO.so) + 61
0x00000001072da94b in TBufferFile::WriteClassBuffer(TClass const*, void*) (in libRIO.so) + 331
0x0000000102c4a326 in TClass::WriteBuffer(TBuffer&, void*, char const*) (in libCore.5.so) + 22
0x0000000102c60f9a in TStreamerBase::WriteBuffer(TBuffer&, char*) (in libCore.5.so) + 186
0x000000010745cb49 in int TStreamerInfo::WriteBufferAux<char**>(TBuffer&, char** const&, int, int, int, int) (in libRIO.so) + 17065
0x000000010733b5eb in TStreamerInfoActions::GenericWriteAction(TBuffer&, void*, TStreamerInfoActions::TConfiguration const*) (in libRIO.so) + 59
0x00000001072daa1d in TBufferFile::ApplySequence(TStreamerInfoActions::TActionSequence const&, void*) (in libRIO.so) + 61
0x00000001072da94b in TBufferFile::WriteClassBuffer(TClass const*, void*) (in libRIO.so) + 331
0x00000001072d8536 in TBufferFile::WriteObjectClass(void const*, TClass const*) (in libRIO.so) + 374
0x00000001072d8655 in TBufferFile::WriteObjectAny(void const*, TClass const*) (in libRIO.so) + 245
0x0000000102c23dd9 in TList::Streamer(TBuffer&) (in libCore.5.so) + 201
0x000000010731d8d8 in TKey::TKey(TObject const*, char const*, int, TDirectory*) (in libRIO.so) + 440
0x00000001072eaebb in TFile::CreateKey(TDirectory*, TObject const*, char const*, int) (in libRIO.so) + 59
0x00000001072e314f in TDirectoryFile::WriteTObject(TObject const*, char const*, char const*, int) (in libRIO.so) + 607
0x0000000102bbb52b in TObject::Write(char const*, int, int) const (in libCore.5.so) + 315
0x00000001072ff526 in TFileMerger::MergeRecursive(TDirectory*, TList*, int) (in libRIO.so) + 6806
0x00000001072ffb25 in TFileMerger::PartialMerge(int) (in libRIO.so) + 517
0x00000001074d9dc9 in G__G__IO_256_0_31(G__value*, char const*, G__param*, int) (in libRIO.so) + 297
0x00000001033734b1 in Cint::G__ExceptionWrapper(int ()(G__value, char const*, G__param*, int), G__value*, char*, G__param*, int) (in libCint.5.so) + 49
0x000000010341c44b in G__execute_call (in libCint.5.so) + 75
0x000000010341c8ac in G__call_cppfunc (in libCint.5.so) + 860
0x00000001033f022e in G__interpret_func (in libCint.5.so) + 5198
0x00000001033de657 in G__getfunction (in libCint.5.so) + 5655
0x00000001034df1cb in G__getstructmem(int, G__FastAllocString&, char*, int, char*, int*, G__var_array*, int) (in libCint.5.so) + 4187
0x00000001034d5cbd in G__getvariable (in libCint.5.so) + 7341
0x00000001033d2eb2 in G__getitem (in libCint.5.so) + 402
0x00000001033cea82 in G__getexpr (in libCint.5.so) + 31458
0x000000010344fd6c in G__exec_statement (in libCint.5.so) + 34988
0x000000010344aa21 in G__exec_statement (in libCint.5.so) + 13665
0x00000001033f2e72 in G__interpret_func (in libCint.5.so) + 16530
0x00000001033de6a4 in G__getfunction (in libCint.5.so) + 5732
0x00000001033d2f1f in G__getitem (in libCint.5.so) + 511
0x00000001033cea82 in G__getexpr (in libCint.5.so) + 31458
0x00000001033c6f13 in G__calc_internal (in libCint.5.so) + 979
0x000000010345ae80 in G__process_cmd (in libCint.5.so) + 16992
0x0000000102c312a4 in TCint::ProcessLine(char const*, TInterpreter::EErrorCode*) (in libCore.5.so) + 884
0x0000000102c31589 in TCint::ProcessLineSynch(char const*, TInterpreter::EErrorCode*) (in libCore.5.so) + 121
0x0000000102b91603 in TApplication::ExecuteFile(char const*, int*, bool) (in libCore.5.so) + 2355
0x0000000102b90852 in TApplication::ProcessLine(char const*, bool, int*) (in libCore.5.so) + 1186
0x0000000103a83bb4 in TRint::HandleTermInput() (in libRint.5.so) + 676
0x0000000102c68d2d in TUnixSystem::CheckDescriptors() (in libCore.5.so) + 317
0x0000000102c71c63 in TMacOSXSystem::DispatchOneEvent(bool) (in libCore.5.so) + 387
0x0000000102bee98a in TSystem::InnerLoop() (in libCore.5.so) + 26
0x0000000102bee888 in TSystem::Run() (in libCore.5.so) + 392
0x0000000102b91914 in TApplication::Run(bool) (in libCore.5.so) + 36
0x0000000103a834dc in TRint::Run(bool) (in libRint.5.so) + 1420
0x0000000102b85e1f in main (in root.exe) + 79
0x00007fff913695fd in start (in libdyld.dylib) + 1
Root > Function mergeOutput() busy flag cleared

root [1]
root [1]
root [1]
root [1]
root [1]
root [1] .q
Fatal in TFileMerger::RecursiveRemove: Output file of the TFile Merger (targeting mergedAnalysisResults.root) has been deleted (likely due to a TTree larger than 100Gb)
aborting
Generating stack trace…
0x0000000102bb9ee9 in TObject::~TObject() (in libCore.5.so) + 73
0x00000001072e923f in TFile::~TFile() (in libRIO.so) + 15
0x0000000102c22e65 in TList::Delete(char const*) (in libCore.5.so) + 325
0x0000000102bd0b17 in TROOT::~TROOT() (in libCore.5.so) + 87
0x00007fff9279a7a1 in __cxa_finalize (in libsystem_c.dylib) + 177
0x00007fff9279aa4c in exit (in libsystem_c.dylib) + 22
0x0000000102c6b32a in TUnixSystem::Exit(int, bool) (in libCore.5.so) + 74
0x0000000102b904bc in TApplication::ProcessLine(char const*, bool, int*) (in libCore.5.so) + 268
0x0000000103a83bb4 in TRint::HandleTermInput() (in libRint.5.so) + 676
0x0000000102c68d2d in TUnixSystem::CheckDescriptors() (in libCore.5.so) + 317
0x0000000102c71c63 in TMacOSXSystem::DispatchOneEvent(bool) (in libCore.5.so) + 387
0x0000000102bee98a in TSystem::InnerLoop() (in libCore.5.so) + 26
0x0000000102bee888 in TSystem::Run() (in libCore.5.so) + 392
0x0000000102b91914 in TApplication::Run(bool) (in libCore.5.so) + 36
0x0000000103a834dc in TRint::Run(bool) (in libRint.5.so) + 1420
0x0000000102b85e1f in main (in root.exe) + 79
0x00007fff913695fd in start (in libdyld.dylib) + 1[/code]

I also got seg faults when trying to merge ROOT files that would end up with multi-gigabyte outputs. I don’t know if it’s a bug in hadd or if it’s some limitation of the filesystem, but I ended up writing my own mergeruns.C program that didn’t have any problems. It’s obviously not as flexible as hadd since I wrote it only for my own case. For example it has hardcoded TTree names and doesn’t recursively follow directories in the TFile, but it works. Maybe by looking at my example you can concoct your own merging program that works. It is attached to this post.

Jean-François
mergeruns.C (2.07 KB)

You should check if a bug report is opened on JIRA and if not, create one to keep a track and to hope that it will be fixed.