Help to understand hadd crashing with nAOD root files

Hello

I would like to understand the crash I am getting while trying to hadd some root files, which individually look just fine. In particular, I am getting this long error [1], which I cannot interpret ; maybe it is because I am trying to merge nanoAOD .root files and this does not work with hadd ?

The root files I am attempting to hadd can be found here :

/afs/cern.ch/user/a/alkaloge/public/hadd_error

Thanks in advance for your help!

There was a crash.

This is the entire stack trace of all threads:
#0 0x0000003f38eac89e in waitpid () from /lib64/libc.so.6
#1 0x0000003f38e3e4e9 in do_system () from /lib64/libc.so.6
#2 0x00007fe11dbeea89 in TUnixSystem::StackTrace() () from /cvmfs/cms.cern.ch/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_0_25/external/slc6_amd64_gcc530/lib/libCore.so
#3 0x00007fe11dbf098c in TUnixSystem::DispatchSignals(ESignals) () from /cvmfs/cms.cern.ch/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_0_25/external/slc6_amd64_gcc530/lib/libCore.so
#4
#5 0x00007fe11e8d5575 in TBranch::ReadLeaves1Impl(TBuffer&) () from /cvmfs/cms.cern.ch/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_0_25/external/slc6_amd64_gcc530/lib/libTree.so
#6 0x00007fe11e8d74b2 in TBranch::GetEntry(long long, int) () from /cvmfs/cms.cern.ch/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_0_25/external/slc6_amd64_gcc530/lib/libTree.so
#7 0x00007fe11e8bf903 in TTree::GetEntry(long long, int) () from /cvmfs/cms.cern.ch/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_0_25/external/slc6_amd64_gcc530/lib/libTree.so
#8 0x00007fe11e8c2d9b in TTree::CopyEntries(TTree*, long long, char const*) () from /cvmfs/cms.cern.ch/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_0_25/external/slc6_amd64_gcc530/lib/libTree.so
#9 0x00007fe11e8ba0d7 in TTree::Merge(TCollection*, TFileMergeInfo*) () from /cvmfs/cms.cern.ch/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_0_25/external/slc6_amd64_gcc530/lib/libTree.so
#10 0x00007fe11dea042b in TFileMerger::MergeRecursive(TDirectory*, TList*, int) () from /cvmfs/cms.cern.ch/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_0_25/external/slc6_amd64_gcc530/lib/libRIO.so
#11 0x00007fe11de9f1fe in TFileMerger::PartialMerge(int) () from /cvmfs/cms.cern.ch/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_0_25/external/slc6_amd64_gcc530/lib/libRIO.so
#12 0x0000000000402325 in main ()

The lines below might hint at the cause of the crash.
If they do not help you then please submit a bug report at
Sign in to GitHub · GitHub. Please post the ENTIRE stack trace
from above as an attachment in addition to anything else
that might help us fixing this issue.
===========================================================
#5 0x00007fe11e8d5575 in TBranch::ReadLeaves1Impl(TBuffer&) () from /cvmfs/cms.cern.ch/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_0_25/external/slc6_amd64_gcc530/lib/libTree.so
#6 0x00007fe11e8d74b2 in TBranch::GetEntry(long long, int) () from /cvmfs/cms.cern.ch/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_0_25/external/slc6_amd64_gcc530/lib/libTree.so
#7 0x00007fe11e8bf903 in TTree::GetEntry(long long, int) () from /cvmfs/cms.cern.ch/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_0_25/external/slc6_amd64_gcc530/lib/libTree.so
#8 0x00007fe11e8c2d9b in TTree::CopyEntries(TTree*, long long, char const*) () from /cvmfs/cms.cern.ch/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_0_25/external/slc6_amd64_gcc530/lib/libTree.so
#9 0x00007fe11e8ba0d7 in TTree::Merge(TCollection*, TFileMergeInfo*) () from /cvmfs/cms.cern.ch/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_0_25/external/slc6_amd64_gcc530/lib/libTree.so
#10 0x00007fe11dea042b in TFileMerger::MergeRecursive(TDirectory*, TList*, int) () from /cvmfs/cms.cern.ch/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_0_25/external/slc6_amd64_gcc530/lib/libRIO.so
#11 0x00007fe11de9f1fe in TFileMerger::PartialMerge(int) () from /cvmfs/cms.cern.ch/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_0_25/external/slc6_amd64_gcc530/lib/libRIO.so
#12 0x0000000000402325 in main ()

*** glibc detected *** hadd: double free or corruption (!prev): 0x00000000036ef4e0 ***
======= Backtrace: =========
/lib64/libc.so.6[0x3f38e75e5e]
/lib64/libc.so.6[0x3f38e78cf0]
/cvmfs/cms.cern.ch/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_0_25/external/slc6_amd64_gcc530/lib/libTree.so(_ZN6TLeafFD1Ev+0x30)[0x7fe11e8eeef0]
/cvmfs/cms.cern.ch/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_0_25/external/slc6_amd64_gcc530/lib/libTree.so(_ZN6TLeafFD0Ev+0x9)[0x7fe11e8eef09]
/cvmfs/cms.cern.ch/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_0_25/external/slc6_amd64_gcc530/lib/libCore.so(_ZN9TObjArray6DeleteEPKc+0x100)[0x7fe11db804b0]
/cvmfs/cms.cern.ch/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_0_25/external/slc6_amd64_gcc530/lib/libTree.so(_ZN7TBranchD2Ev+0x15f)[0x7fe11e8d947f]
/cvmfs/cms.cern.ch/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_0_25/external/slc6_amd64_gcc530/lib/libTree.so(_ZN7TBranchD0Ev+0x9)[0x7fe11e8d9689]
/cvmfs/cms.cern.ch/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_0_25/external/slc6_amd64_gcc530/lib/libCore.so(_ZN9TObjArray6DeleteEPKc+0x100)[0x7fe11db804b0]
/cvmfs/cms.cern.ch/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_0_25/external/slc6_amd64_gcc530/lib/libTree.so(_ZN5TTreeD2Ev+0x130)[0x7fe11e8c2830]
/cvmfs/cms.cern.ch/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_0_25/external/slc6_amd64_gcc530/lib/libTree.so(_ZN5TTreeD0Ev+0x9)[0x7fe11e8c2bd9]
/cvmfs/cms.cern.ch/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_0_25/external/slc6_amd64_gcc530/lib/libCore.so(_ZN5TList6DeleteEPKc+0x205)[0x7fe11db7eb15]
/cvmfs/cms.cern.ch/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_0_25/external/slc6_amd64_gcc530/lib/libRIO.so(_ZN14TDirectoryFile5CloseEPKc+0xd5)[0x7fe11de05145]
/cvmfs/cms.cern.ch/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_0_25/external/slc6_amd64_gcc530/lib/libRIO.so(_ZN5TFile5CloseEPKc+0x1c2)[0x7fe11df37352]
/cvmfs/cms.cern.ch/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_0_25/external/slc6_amd64_gcc530/lib/libCore.so(+0x14a5f0)[0x7fe11da6f5f0]
/cvmfs/cms.cern.ch/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_0_25/external/slc6_amd64_gcc530/lib/libCore.so(_ZN5TROOT10CloseFilesEv+0x3e)[0x7fe11da6fcbe]
/cvmfs/cms.cern.ch/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_0_25/external/slc6_amd64_gcc530/lib/libCore.so(_ZN5TROOT20EndOfProcessCleanupsEv+0x9)[0x7fe11da70289]
/cvmfs/cms.cern.ch/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_0_25/external/slc6_amd64_gcc530/lib/libCore.so(_ZN11TUnixSystem4ExitEib+0x21)[0x7fe11dbebbc1]
/cvmfs/cms.cern.ch/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_0_25/external/slc6_amd64_gcc530/lib/libCore.so(_ZN11TUnixSystem15DispatchSignalsE8ESignals+0x1c6)[0x7fe11dbf0aa6]
/lib64/libpthread.so.0[0x3f3960f7e0]
/cvmfs/cms.cern.ch/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_0_25/external/slc6_amd64_gcc530/lib/libTree.so(_ZN7TBranch15ReadLeaves1ImplER7TBuffer+0x15)[0x7fe11e8d5575]
/cvmfs/cms.cern.ch/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_0_25/external/slc6_amd64_gcc530/lib/libTree.so(_ZN7TBranch8GetEntryExi+0xe2)[0x7fe11e8d74b2]
/cvmfs/cms.cern.ch/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_0_25/external/slc6_amd64_gcc530/lib/libTree.so(_ZN5TTree8GetEntryExi+0xa3)[0x7fe11e8bf903]
/cvmfs/cms.cern.ch/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_0_25/external/slc6_amd64_gcc530/lib/libTree.so(_ZN5TTree11CopyEntriesEPS_xPKc+0x17b)[0x7fe11e8c2d9b]
/cvmfs/cms.cern.ch/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_0_25/external/slc6_amd64_gcc530/lib/libTree.so(_ZN5TTree5MergeEP11TCollectionP14TFileMergeInfo+0x217)[0x7fe11e8ba0d7]
/cvmfs/cms.cern.ch/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_0_25/external/slc6_amd64_gcc530/lib/libRIO.so(_ZN11TFileMerger14MergeRecursiveEP10TDirectoryP5TListi+0xbcb)[0x7fe11dea042b]
/cvmfs/cms.cern.ch/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_0_25/external/slc6_amd64_gcc530/lib/libRIO.so(_ZN11TFileMerger12PartialMergeEi+0xee)[0x7fe11de9f1fe]
hadd(main+0xae5)[0x402325]
/lib64/libc.so.6(__libc_start_main+0x100)[0x3f38e1ed20]

Can you run the failing example with valgrind --suppressions=$ROOTSYS/etc/valgrind-root.supp?

Hello

I just did and I am attaching the log.

Regards,
Alexis

log.txt (258.4 KB)

The key part is

==27295== Invalid write of size 4
==27295==    at 0x554733B: TBufferFile::ReadFastArray(float*, int) (in /cvmfs/cms.cern.ch/slc6_amd64_gcc700/lcg/root/6.12.07-gnimlf5/lib/libRIO.so)
==27295==    by 0x52ECBB3: TBranch::GetEntry(long long, int) (in /cvmfs/cms.cern.ch/slc6_amd64_gcc700/lcg/root/6.12.07-gnimlf5/lib/libTree.so)
==27295==    by 0x534B7E0: TTree::GetEntry(long long, int) (in /cvmfs/cms.cern.ch/slc6_amd64_gcc700/lcg/root/6.12.07-gnimlf5/lib/libTree.so)
==27295==    by 0x534F387: TTree::CopyEntries(TTree*, long long, char const*) (in /cvmfs/cms.cern.ch/slc6_amd64_gcc700/lcg/root/6.12.07-gnimlf5/lib/libTree.so)
==27295==    by 0x5343F7E: TTree::Merge(TCollection*, TFileMergeInfo*) (in /cvmfs/cms.cern.ch/slc6_amd64_gcc700/lcg/root/6.12.07-gnimlf5/lib/libTree.so)
==27295==    by 0x5599EAE: TFileMerger::MergeRecursive(TDirectory*, TList*, int) (in /cvmfs/cms.cern.ch/slc6_amd64_gcc700/lcg/root/6.12.07-gnimlf5/lib/libRIO.so)
==27295==    by 0x5598457: TFileMerger::PartialMerge(int) (in /cvmfs/cms.cern.ch/slc6_amd64_gcc700/lcg/root/6.12.07-gnimlf5/lib/libRIO.so)
==27295==    by 0x407209: main::{lambda(TFileMerger&, int, int)#2}::operator()(TFileMerger&, int, int) const [clone .constprop.177] (in /cvmfs/cms.cern.ch/slc6_amd64_gcc700/lcg/root/6.12.07-gnimlf5/bin/hadd)
==27295==    by 0x404874: main (in /cvmfs/cms.cern.ch/slc6_amd64_gcc700/lcg/root/6.12.07-gnimlf5/bin/hadd)
==27295==  Address 0x144ee1d4 is 0 bytes after a block of size 36 alloc'd
==27295==    at 0x48098C7: operator new[](unsigned long) (in /cvmfs/cms.cern.ch/slc6_amd64_gcc700/external/valgrind/3.13.0-omkpbe2/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==27295==    by 0x5333637: TLeafF::SetAddress(void*) (in /cvmfs/cms.cern.ch/slc6_amd64_gcc700/lcg/root/6.12.07-gnimlf5/lib/libTree.so)
==27295==    by 0x53414C6: TTree::CopyAddresses(TTree*, bool) (in /cvmfs/cms.cern.ch/slc6_amd64_gcc700/lcg/root/6.12.07-gnimlf5/lib/libTree.so)
==27295==    by 0x5343F68: TTree::Merge(TCollection*, TFileMergeInfo*) (in /cvmfs/cms.cern.ch/slc6_amd64_gcc700/lcg/root/6.12.07-gnimlf5/lib/libTree.so)
==27295==    by 0x5599EAE: TFileMerger::MergeRecursive(TDirectory*, TList*, int) (in /cvmfs/cms.cern.ch/slc6_amd64_gcc700/lcg/root/6.12.07-gnimlf5/lib/libRIO.so)
==27295==    by 0x5598457: TFileMerger::PartialMerge(int) (in /cvmfs/cms.cern.ch/slc6_amd64_gcc700/lcg/root/6.12.07-gnimlf5/lib/libRIO.so)
==27295==    by 0x407209: main::{lambda(TFileMerger&, int, int)#2}::operator()(TFileMerger&, int, int) const [clone .constprop.177] (in /cvmfs/cms.cern.ch/slc6_amd64_gcc700/lcg/root/6.12.07-gnimlf5/bin/hadd)
==27295==    by 0x404874: main (in /cvmfs/cms.cern.ch/slc6_amd64_gcc700/lcg/root/6.12.07-gnimlf5/bin/hadd)

Which seems to be all internal but also points out that the data is being unstreamed (and some CMS data can not be unstreamed without the CMS libraries).

What command line arguments did you invoke hadd with? Is a CMSSW release setup in your shell?

I ve tried ‘cmsenv’ under 9_4_9 and 10_2_15. The former uses

/cvmfs/cms.cern.ch/slc6_amd64_gcc630/lcg/root/6.10.08-elfike2/

while the latter points to

/cvmfs/cms.cern.ch/slc6_amd64_gcc700/lcg/root/6.12.07-gnimlf5/

The command I issue is : hadd -f -k target.root single*root

Does the problem still appears if you merge a single file? What is the minimum set of files producing the problem?

yes, merging one is ok, but it fails for two or more.

Thanks,

Alexis

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.