Please provide the following information:
ROOT Version (e.g. 6.10/04):
Platform, compiler (e.g. SLF6, gcc6.3)
I am using ROOT 6.10.04d to make a TChain of three anatuple files; call them A, B, and C. These files were produced using a relatively recent version of ROOT 5. They’re all small - 30-some odd events. If I load the files in order “ABC”, my code segfaults when changing from file A to file B (when loading the first file in B). I can open A, B, and C, and they look fine. If I build a chain from any one of just A, B, or C, everything is fine. If I order the files in the chain “BCA”, everything is also fine. Interestingly, the order “BAC” also segfaults (in the A->C transition), so it is something about calling GetEntry
in any file after finishing with file A.
My code is based on MakeClass
. I’ve replaced all the fixed size array elements with numbers that are all much bigger than the length of any of the arrays in files A, B, or C.
The segault is:
#5 0x00007fd7757abcb6 in ROOT::Detail::TCollectionProxyInfo::Type<std::vector<std::vector<double, std::allocator<double> >, std::allocator<std::vector<double, std::allocator<double> > > > >::clear(void*) () from /minerva/app/users/perdue/hep_hpc_products/root/v6_10_04d/Linux64bit+2.6-2.12-e14-prof/lib/libMathCore.so
#6 0x00007fd777bc9801 in TGenCollectionStreamer::ReadBufferGeneric(TBuffer&, void*, TClass const*) () from /minerva/app/users/perdue/hep_hpc_products/root/v6_10_04d/Linux64bit+2.6-2.12-e14-prof/lib/libRIO.so
#7 0x00007fd777b88dd4 in TBufferFile::ReadFastArray(void*, TClass const*, int, TMemberStreamer*, TClass const*) () from /minerva/app/users/perdue/hep_hpc_products/root/v6_10_04d/Linux64bit+2.6-2.12-e14-prof/lib/libRIO.so
#8 0x00007fd777c18a05 in int TStreamerInfoActions::ReadSTL<&TStreamerInfoActions::ReadSTLMemberWiseSameClass, &TStreamerInfoActions::ReadSTLObjectWiseFastArray>(TBuffer&, void*, TStreamerInfoActions::TConfiguration const*) () from /minerva/app/users/perdue/hep_hpc_products/root/v6_10_04d/Linux64bit+2.6-2.12-e14-prof/lib/libRIO.so
#9 0x00007fd777b85ce4 in TBufferFile::ApplySequence(TStreamerInfoActions::TActionSequence const&, void*) () from /minerva/app/users/perdue/hep_hpc_products/root/v6_10_04d/Linux64bit+2.6-2.12-e14-prof/lib/libRIO.so
#10 0x00007fd77685cb36 in TBranchElement::ReadLeavesMakeClass(TBuffer&) () from /minerva/app/users/perdue/hep_hpc_products/root/v6_10_04d/Linux64bit+2.6-2.12-e14-prof/lib/libTree.so
#11 0x00007fd77684d761 in TBranch::GetEntry(long long, int) () from /minerva/app/users/perdue/hep_hpc_products/root/v6_10_04d/Linux64bit+2.6-2.12-e14-prof/lib/libTree.so
#12 0x00007fd77685be3f in TBranchElement::GetEntry(long long, int) () from /minerva/app/users/perdue/hep_hpc_products/root/v6_10_04d/Linux64bit+2.6-2.12-e14-prof/lib/libTree.so
#13 0x00007fd77688aad3 in TTree::GetEntry(long long, int) () from /minerva/app/users/perdue/hep_hpc_products/root/v6_10_04d/Linux64bit+2.6-2.12-e14-prof/lib/libTree.so
#14 0x00007fd7748cac32 in RECOTRACKS_ANA::NuECCQE::GetEntry (this=0x5295a80, entry=33) at ../include/NuECCQE.h:835
#15 0x0000000000455fa6 in Skim (n_max_evts=-1, max_z=100000000, filebasename=..., first_event_number=0, is_data=false, ntuple_list_file=..., norm_to_max=false, do_low_w_cut=false, do_high_w_cut=false, w_cut_val_mev=1000, print_freq=1) at skimmer_root2hdf5_nueccqe.cxx:167
#16 0x00000000004586a1 in main (argc=10, argv=0x7ffc42e2b9d8) at skimmer_root2hdf5_nueccqe.cxx:417
I’ve processed many millions of events with essentially this same code before, but in that case all the ntuples were merged in advance. In this case I have thousands of files, and if I pick a random subset of 50 or 100 of them, I might be able to get my code to complete, but scattered throughout are files that cause this issue.
Inspired by Segfault on TTree::GetEntry, I’ve run valgrind
on the job with valgrind --suppressions=$ROOTSYS/etc/valgrind-root.supp ...
. This produces output like:
==17054== Invalid read of size 8
==17054== at 0x8A1B790: std::vector<std::vector<double, std::allocator<double> >, std::allocator<std::vector<double, std::allocator<double> > > >::clear() (stl_vector.h:1210)
==17054== by 0x8A14327: ROOT::Detail::TCollectionProxyInfo::Type<std::vector<std::vector<double, std::allocator<double> >, std::allocator<std::vector<double, std::allocator<double> > > > >::clear(void*) (TCollectionProxyInfo.h:306)
==17054== by 0x5E2CBB7: TGenCollectionProxy::Method::invoke(void*) const (TGenCollectionProxy.h:200)
==17054== by 0x5E2A79E: TGenCollectionProxy::Clear(char const*) (TGenCollectionProxy.cxx:1076)
==17054== by 0x5E35026: TGenCollectionStreamer::ReadBufferGeneric(TBuffer&, void*, TClass const*) (TGenCollectionStreamer.cxx:1314)
==17054== by 0x5E34BF3: TGenCollectionStreamer::ReadBuffer(TBuffer&, void*) (TGenCollectionStreamer.cxx:1233)
==17054== by 0x5DAAD18: TCollectionClassStreamer::Stream(TBuffer&, void*, TClass const*) (TCollectionProxyFactory.h:179)
==17054== by 0x567D84B: TClass::StreamerExternal(TClass const*, void*, TBuffer&, TClass const*) (TClass.cxx:6346)
==17054== by 0x5DBED8E: TClass::Streamer(void*, TBuffer&, TClass const*) const (TClass.h:537)
==17054== by 0x5DB67C8: TBufferFile::ReadFastArray(void*, TClass const*, int, TMemberStreamer*, TClass const*) (TBufferFile.cxx:1677)
==17054== by 0x5E95C80: TStreamerInfoActions::ReadSTLObjectWiseFastArray(TBuffer&, void*, TStreamerInfoActions::TConfiguration const*, short, unsigned int) (TStreamerInfoActions.cxx:706)
==17054== by 0x5E9915E: int TStreamerInfoActions::ReadSTL<&TStreamerInfoActions::ReadSTLMemberWiseSameClass, &TStreamerInfoActions::ReadSTLObjectWiseFastArray>(TBuffer&, void*, TStreamerInfoActions::TConfiguration const*) (TStreamerInfoActions.cxx:749)
==17054== Address 0xd is not stack'd, malloc'd or (recently) free'd
==17054==
*** Break *** segmentation violation
#0 vgModuleLocal_do_syscall_for_client_WRK () at m_syswrap/syscall-amd64-linux.S:173
#1 0x0000000038092c25 in do_syscall_for_client (syscall_mask=0x803095e80, tst=0x802008450, syscallno=61) at m_syswrap/syswrap-main.c:339
#2 vgPlain_client_syscall (tid=1, trc=<optimized out>) at m_syswrap/syswrap-main.c:2007
#3 0x0000000038090eb5 in handle_syscall (trc=73, tid=1) at m_scheduler/scheduler.c:1118
#4 vgPlain_scheduler (tid=1) at m_scheduler/scheduler.c:1435
#5 0x00000000380c6f70 in thread_wrapper (tidW=1) at m_syswrap/syswrap-linux.c:103
#6 run_a_thread_NORETURN (tidW=1) at m_syswrap/syswrap-linux.c:156
#7 0x0000000000000000 in ?? ()
==17054==
==17054== HEAP SUMMARY:
==17054== in use at exit: 260,909,649 bytes in 104,715 blocks
==17054== total heap usage: 917,883 allocs, 813,168 frees, 665,269,534 bytes allocated
==17054==
==17054== LEAK SUMMARY:
==17054== definitely lost: 24 bytes in 1 blocks
==17054== indirectly lost: 80 bytes in 2 blocks
==17054== possibly lost: 4,096 bytes in 72 blocks
==17054== still reachable: 260,740,990 bytes in 102,719 blocks
==17054== of which reachable via heuristic:
==17054== newarray : 24,384 bytes in 38 blocks
==17054== multipleinheritance: 4,976 bytes in 8 blocks
==17054== suppressed: 164,459 bytes in 1,921 blocks
==17054== Rerun with --leak-check=full to see details of leaked memory
==17054==
==17054== For counts of detected and suppressed errors, rerun with: -v
==17054== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 2634 from 64)
Has anyone seen anything like this before? Any ideas on a solution?
I don’t think this is a bug in my code since it works if I re-order the files in the TChain. Could it be a problem with the files? Or a bug in ROOT?
Thanks for any thoughts…
pax
Gabe