Segmentation fault when calling TTree::GetEntry with ROOT 6.10.04d

Please provide the following information:


ROOT Version (e.g. 6.10/04):
Platform, compiler (e.g. SLF6, gcc6.3)


I am using ROOT 6.10.04d to make a TChain of three anatuple files; call them A, B, and C. These files were produced using a relatively recent version of ROOT 5. They’re all small - 30-some odd events. If I load the files in order “ABC”, my code segfaults when changing from file A to file B (when loading the first file in B). I can open A, B, and C, and they look fine. If I build a chain from any one of just A, B, or C, everything is fine. If I order the files in the chain “BCA”, everything is also fine. Interestingly, the order “BAC” also segfaults (in the A->C transition), so it is something about calling GetEntry in any file after finishing with file A.

My code is based on MakeClass. I’ve replaced all the fixed size array elements with numbers that are all much bigger than the length of any of the arrays in files A, B, or C.

The segault is:

#5  0x00007fd7757abcb6 in ROOT::Detail::TCollectionProxyInfo::Type<std::vector<std::vector<double, std::allocator<double> >, std::allocator<std::vector<double, std::allocator<double> > > > >::clear(void*) () from /minerva/app/users/perdue/hep_hpc_products/root/v6_10_04d/Linux64bit+2.6-2.12-e14-prof/lib/libMathCore.so
#6  0x00007fd777bc9801 in TGenCollectionStreamer::ReadBufferGeneric(TBuffer&, void*, TClass const*) () from /minerva/app/users/perdue/hep_hpc_products/root/v6_10_04d/Linux64bit+2.6-2.12-e14-prof/lib/libRIO.so
#7  0x00007fd777b88dd4 in TBufferFile::ReadFastArray(void*, TClass const*, int, TMemberStreamer*, TClass const*) () from /minerva/app/users/perdue/hep_hpc_products/root/v6_10_04d/Linux64bit+2.6-2.12-e14-prof/lib/libRIO.so
#8  0x00007fd777c18a05 in int TStreamerInfoActions::ReadSTL<&TStreamerInfoActions::ReadSTLMemberWiseSameClass, &TStreamerInfoActions::ReadSTLObjectWiseFastArray>(TBuffer&, void*, TStreamerInfoActions::TConfiguration const*) () from /minerva/app/users/perdue/hep_hpc_products/root/v6_10_04d/Linux64bit+2.6-2.12-e14-prof/lib/libRIO.so
#9  0x00007fd777b85ce4 in TBufferFile::ApplySequence(TStreamerInfoActions::TActionSequence const&, void*) () from /minerva/app/users/perdue/hep_hpc_products/root/v6_10_04d/Linux64bit+2.6-2.12-e14-prof/lib/libRIO.so
#10 0x00007fd77685cb36 in TBranchElement::ReadLeavesMakeClass(TBuffer&) () from /minerva/app/users/perdue/hep_hpc_products/root/v6_10_04d/Linux64bit+2.6-2.12-e14-prof/lib/libTree.so
#11 0x00007fd77684d761 in TBranch::GetEntry(long long, int) () from /minerva/app/users/perdue/hep_hpc_products/root/v6_10_04d/Linux64bit+2.6-2.12-e14-prof/lib/libTree.so
#12 0x00007fd77685be3f in TBranchElement::GetEntry(long long, int) () from /minerva/app/users/perdue/hep_hpc_products/root/v6_10_04d/Linux64bit+2.6-2.12-e14-prof/lib/libTree.so
#13 0x00007fd77688aad3 in TTree::GetEntry(long long, int) () from /minerva/app/users/perdue/hep_hpc_products/root/v6_10_04d/Linux64bit+2.6-2.12-e14-prof/lib/libTree.so
#14 0x00007fd7748cac32 in RECOTRACKS_ANA::NuECCQE::GetEntry (this=0x5295a80, entry=33) at ../include/NuECCQE.h:835
#15 0x0000000000455fa6 in Skim (n_max_evts=-1, max_z=100000000, filebasename=..., first_event_number=0, is_data=false, ntuple_list_file=..., norm_to_max=false, do_low_w_cut=false, do_high_w_cut=false, w_cut_val_mev=1000, print_freq=1) at skimmer_root2hdf5_nueccqe.cxx:167
#16 0x00000000004586a1 in main (argc=10, argv=0x7ffc42e2b9d8) at skimmer_root2hdf5_nueccqe.cxx:417

I’ve processed many millions of events with essentially this same code before, but in that case all the ntuples were merged in advance. In this case I have thousands of files, and if I pick a random subset of 50 or 100 of them, I might be able to get my code to complete, but scattered throughout are files that cause this issue.

Inspired by Segfault on TTree::GetEntry, I’ve run valgrind on the job with valgrind --suppressions=$ROOTSYS/etc/valgrind-root.supp .... This produces output like:

==17054== Invalid read of size 8
==17054==    at 0x8A1B790: std::vector<std::vector<double, std::allocator<double> >, std::allocator<std::vector<double, std::allocator<double> > > >::clear() (stl_vector.h:1210)
==17054==    by 0x8A14327: ROOT::Detail::TCollectionProxyInfo::Type<std::vector<std::vector<double, std::allocator<double> >, std::allocator<std::vector<double, std::allocator<double> > > > >::clear(void*) (TCollectionProxyInfo.h:306)
==17054==    by 0x5E2CBB7: TGenCollectionProxy::Method::invoke(void*) const (TGenCollectionProxy.h:200)
==17054==    by 0x5E2A79E: TGenCollectionProxy::Clear(char const*) (TGenCollectionProxy.cxx:1076)
==17054==    by 0x5E35026: TGenCollectionStreamer::ReadBufferGeneric(TBuffer&, void*, TClass const*) (TGenCollectionStreamer.cxx:1314)
==17054==    by 0x5E34BF3: TGenCollectionStreamer::ReadBuffer(TBuffer&, void*) (TGenCollectionStreamer.cxx:1233)
==17054==    by 0x5DAAD18: TCollectionClassStreamer::Stream(TBuffer&, void*, TClass const*) (TCollectionProxyFactory.h:179)
==17054==    by 0x567D84B: TClass::StreamerExternal(TClass const*, void*, TBuffer&, TClass const*) (TClass.cxx:6346)
==17054==    by 0x5DBED8E: TClass::Streamer(void*, TBuffer&, TClass const*) const (TClass.h:537)
==17054==    by 0x5DB67C8: TBufferFile::ReadFastArray(void*, TClass const*, int, TMemberStreamer*, TClass const*) (TBufferFile.cxx:1677)
==17054==    by 0x5E95C80: TStreamerInfoActions::ReadSTLObjectWiseFastArray(TBuffer&, void*, TStreamerInfoActions::TConfiguration const*, short, unsigned int) (TStreamerInfoActions.cxx:706)
==17054==    by 0x5E9915E: int TStreamerInfoActions::ReadSTL<&TStreamerInfoActions::ReadSTLMemberWiseSameClass, &TStreamerInfoActions::ReadSTLObjectWiseFastArray>(TBuffer&, void*, TStreamerInfoActions::TConfiguration const*) (TStreamerInfoActions.cxx:749)
==17054==  Address 0xd is not stack'd, malloc'd or (recently) free'd
==17054==

 *** Break *** segmentation violation
#0  vgModuleLocal_do_syscall_for_client_WRK () at m_syswrap/syscall-amd64-linux.S:173
#1  0x0000000038092c25 in do_syscall_for_client (syscall_mask=0x803095e80, tst=0x802008450, syscallno=61) at m_syswrap/syswrap-main.c:339
#2  vgPlain_client_syscall (tid=1, trc=<optimized out>) at m_syswrap/syswrap-main.c:2007
#3  0x0000000038090eb5 in handle_syscall (trc=73, tid=1) at m_scheduler/scheduler.c:1118
#4  vgPlain_scheduler (tid=1) at m_scheduler/scheduler.c:1435
#5  0x00000000380c6f70 in thread_wrapper (tidW=1) at m_syswrap/syswrap-linux.c:103
#6  run_a_thread_NORETURN (tidW=1) at m_syswrap/syswrap-linux.c:156
#7  0x0000000000000000 in ?? ()
==17054==
==17054== HEAP SUMMARY:
==17054==     in use at exit: 260,909,649 bytes in 104,715 blocks
==17054==   total heap usage: 917,883 allocs, 813,168 frees, 665,269,534 bytes allocated
==17054==
==17054== LEAK SUMMARY:
==17054==    definitely lost: 24 bytes in 1 blocks
==17054==    indirectly lost: 80 bytes in 2 blocks
==17054==      possibly lost: 4,096 bytes in 72 blocks
==17054==    still reachable: 260,740,990 bytes in 102,719 blocks
==17054==                       of which reachable via heuristic:
==17054==                         newarray           : 24,384 bytes in 38 blocks
==17054==                         multipleinheritance: 4,976 bytes in 8 blocks
==17054==         suppressed: 164,459 bytes in 1,921 blocks
==17054== Rerun with --leak-check=full to see details of leaked memory
==17054==
==17054== For counts of detected and suppressed errors, rerun with: -v
==17054== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 2634 from 64)

Has anyone seen anything like this before? Any ideas on a solution?

I don’t think this is a bug in my code since it works if I re-order the files in the TChain. Could it be a problem with the files? Or a bug in ROOT?

Thanks for any thoughts…

pax
Gabe

Try to run MakeClass separately on all three files (A, B and C), then compare the output (maybe they are not exactly “equivalent”).

Hello,

Thanks for the response. A difference in the class structure was something I was concerned about, but I don’t think that is the problem. The problem appears and disappears based on the order of the files. I have, however, created the output you suggest and pasted it to GitHubGist:

There are four total files there - the output of MakeClass for files “A”, “B”, and “C”, and then also for the actual class I use. There is some difference between the raw output of MakeClass on the three files - fixed size arrays vary between them. As I mentioned in my original post, the actual header I use is one I modified by hand to use fixed size array sizes that are bigger than what appears in any of the individual files.

If you notice a mistake in my headers though, I’d be very interested to learn about that. I also remain interested in other hypotheses to explain this crash.

Thanks again for taking the time to respond!

pax
Gabe

All “original” files in your post are exactly the same.

They do indeed look similar (as one would hope), but if you look closely, they are different. For example,

C:

   Int_t           mc_nFSPart;
   Double_t        mc_FSPartPx[98];   //[mc_nFSPart]
   Double_t        mc_FSPartPy[98];   //[mc_nFSPart]
   Double_t        mc_FSPartPz[98];   //[mc_nFSPart]
   Double_t        mc_FSPartE[98];   //[mc_nFSPart]
   Int_t           mc_FSPartPDG[98];   //[mc_nFSPart]
   Int_t           mc_er_nPart;
   Int_t           mc_er_ID[120];   //[mc_er_nPart]
   Int_t           mc_er_status[120];   //[mc_er_nPart]
   Double_t        mc_er_posInNucX[120];   //[mc_er_nPart]
   Double_t        mc_er_posInNucY[120];   //[mc_er_nPart]
   Double_t        mc_er_posInNucZ[120];   //[mc_er_nPart]
   Double_t        mc_er_Px[120];   //[mc_er_nPart]
   Double_t        mc_er_Py[120];   //[mc_er_nPart]
   Double_t        mc_er_Pz[120];   //[mc_er_nPart]
   Double_t        mc_er_E[120];   //[mc_er_nPart]
   Int_t           mc_er_FD[120];   //[mc_er_nPart]
   Int_t           mc_er_LD[120];   //[mc_er_nPart]
   Int_t           mc_er_mother[120];   //[mc_er_nPart]

B:

   Int_t           mc_nFSPart;
   Double_t        mc_FSPartPx[70];   //[mc_nFSPart]
   Double_t        mc_FSPartPy[70];   //[mc_nFSPart]
   Double_t        mc_FSPartPz[70];   //[mc_nFSPart]
   Double_t        mc_FSPartE[70];   //[mc_nFSPart]
   Int_t           mc_FSPartPDG[70];   //[mc_nFSPart]
   Int_t           mc_er_nPart;
   Int_t           mc_er_ID[84];   //[mc_er_nPart]
   Int_t           mc_er_status[84];   //[mc_er_nPart]
   Double_t        mc_er_posInNucX[84];   //[mc_er_nPart]
   Double_t        mc_er_posInNucY[84];   //[mc_er_nPart]
   Double_t        mc_er_posInNucZ[84];   //[mc_er_nPart]
   Double_t        mc_er_Px[84];   //[mc_er_nPart]
   Double_t        mc_er_Py[84];   //[mc_er_nPart]
   Double_t        mc_er_Pz[84];   //[mc_er_nPart]
   Double_t        mc_er_E[84];   //[mc_er_nPart]
   Int_t           mc_er_FD[84];   //[mc_er_nPart]
   Int_t           mc_er_LD[84];   //[mc_er_nPart]
   Int_t           mc_er_mother[84];   //[mc_er_nPart]

A:

   Int_t           mc_nFSPart;
   Double_t        mc_FSPartPx[70];   //[mc_nFSPart]
   Double_t        mc_FSPartPy[70];   //[mc_nFSPart]
   Double_t        mc_FSPartPz[70];   //[mc_nFSPart]
   Double_t        mc_FSPartE[70];   //[mc_nFSPart]
   Int_t           mc_FSPartPDG[70];   //[mc_nFSPart]
   Int_t           mc_er_nPart;
   Int_t           mc_er_ID[92];   //[mc_er_nPart]
   Int_t           mc_er_status[92];   //[mc_er_nPart]
   Double_t        mc_er_posInNucX[92];   //[mc_er_nPart]
   Double_t        mc_er_posInNucY[92];   //[mc_er_nPart]
   Double_t        mc_er_posInNucZ[92];   //[mc_er_nPart]
   Double_t        mc_er_Px[92];   //[mc_er_nPart]
   Double_t        mc_er_Py[92];   //[mc_er_nPart]
   Double_t        mc_er_Pz[92];   //[mc_er_nPart]
   Double_t        mc_er_E[92];   //[mc_er_nPart]
   Int_t           mc_er_FD[92];   //[mc_er_nPart]
   Int_t           mc_er_LD[92];   //[mc_er_nPart]
   Int_t           mc_er_mother[92];   //[mc_er_nPart]

Of course, for the modified header I’m using, I’ve replaced the specific numbers with constants that are larger than all three of the cases appearing here.

Thanks for your continued interest!

The crash seems to be related to some vector<vector<double> > branch which has nothing to do with these variable size arrays (as far as I can see) so, maybe @pcanal can comment on it (you seem to use a quite old ROOT 6.10.04 version to analyse files produced by ROOT 5.34/36).

a branch containing a vector<vector> can not be read in ‘MakeClass’ mode. You would need to call SetMakeClass(kFALSE) on the branch and set the branch address to the address of a pointer to a vector<vector>.

Cheers,
Philippe.

Hi Philippe,

I’m Googling around a bit trying to understand your suggestion. Is this a method I call on the TChain after I have made it? Or do I run MakeClass and then make a modification to the resulting .h file? For example, in the Init method there is a block like:

    fCurrent = -1;
    fChain->SetMakeClass(1);

    fChain->SetBranchAddress("eventID", &eventID, &b_eventID);

Is the correct thing to do to GetBranch(name), then call SetMakeClass(kFALSE)? So, in other words, add a block like:

    fCurrent = -1;
    fChain->SetMakeClass(1);

    fChain->GetBranch("prong_axis_vector")->SetMakeClass(kFALSE);
    fChain->GetBranch("prong_axis_vertex")->SetMakeClass(kFALSE);
    fChain->GetBranch("prong_binned_energy_bin_contents")->SetMakeClass(kFALSE);
    fChain->GetBranch("prong_binned_energy_bin_indices")->SetMakeClass(kFALSE);
    fChain->GetBranch("prong_part_E")->SetMakeClass(kFALSE);
    fChain->GetBranch("prong_part_pos")->SetMakeClass(kFALSE);

    fChain->SetBranchAddress("eventID", &eventID, &b_eventID);

to the Init method?

I’m not sure what you mean by setting the branch address the address of a point to a vector<vector>. Right now there are lines like:

    fChain->SetBranchAddress("prong_axis_vector", &prong_axis_vector, &b_prong_axis_vector);
    fChain->SetBranchAddress("prong_axis_vertex", &prong_axis_vertex, &b_prong_axis_vertex);
    // etc.

How should I modify these lines? The class contains public ivars like:

    std::vector<std::vector<double> > *prong_axis_vector;
    std::vector<std::vector<double> > *prong_axis_vertex;

Isn’t the branch address already the address of a pointer to a vector<vector>?

I can say that if I only modify the header with the block of fChain->GetBranch->SetMakeClass calls above that the code compiles and runs but still segfaults at the same point with the same error.

Thanks for your suggestions!

pax
Gabe

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.