Seg fault from RDataFrame GetBranchNames

Dear ROOT experts,

I’m experimenting with reading in the ATLAS xAOD data format using RDataFrame, in order to get masters’ students started quickly with their analysis. I have found this to be a highly efficient way of enabling them to explore the data without needing detailed knowledge of the ATLAS software. Not all containers can be read without ATLAS libraries, but many (all of the “AuxDyn” branches) can be.

I have hit a snag with one particular file, which seems to be OK from the ATLAS side but which is triggering a segfault in the GetBranchNames() method. Note that I’m not trying to access the contents of any particular branch - just get the names and types of each branch. Also I’ve used the same code with other similar input without this problem. I attach some code that can reproduce the crash, and a CERNBox link to the input file, along with the stack trace.

ROOT Version: 6.20.06-x86_64-centos7-gcc8-opt
Platform: CentOS7 (same crash also from MacOS via Conda)
Compiler: gcc8-opt

Thanks for any help, and thanks for the very nice development with RDataFrame!

James Catmore

Input file: CERNBox
Code:
test.py (1.3 KB)
Stack trace:

===========================================================
There was a crash.
This is the entire stack trace of all threads:
===========================================================
#0  0x00007f2a281e146c in waitpid () from /lib64/libc.so.6
#1  0x00007f2a2815ef62 in do_system () from /lib64/libc.so.6
#2  0x00007f2a25c87663 in TUnixSystem::StackTrace() () from /cvmfs/sft.cern.ch/lcg/releases/LCG_97a/ROOT/v6.20.06/x86_64-centos7-gcc8-opt/lib/libCore.so
#3  0x00007f2a25c89eb4 in TUnixSystem::DispatchSignals(ESignals) () from /cvmfs/sft.cern.ch/lcg/releases/LCG_97a/ROOT/v6.20.06/x86_64-centos7-gcc8-opt/lib/libCore.so
#4  <signal handler called>
#5  0x00007f2a25c23b5a in TClass::GetCollectionProxy() const () from /cvmfs/sft.cern.ch/lcg/releases/LCG_97a/ROOT/v6.20.06/x86_64-centos7-gcc8-opt/lib/libCore.so
#6  0x00007f2a270804f8 in TBranchElement::SetReadLeavesPtr() () from /cvmfs/sft.cern.ch/lcg/releases/LCG_97a/ROOT/v6.20.06/x86_64-centos7-gcc8-opt/lib/libTree.so
#7  0x00007f2a270812a0 in TBranchElement::SetMakeClass(bool) () from /cvmfs/sft.cern.ch/lcg/releases/LCG_97a/ROOT/v6.20.06/x86_64-centos7-gcc8-opt/lib/libTree.so
#8  0x00007f2a270cdb56 in TTree::SetMakeClass(int) [clone .localalias.305] () from /cvmfs/sft.cern.ch/lcg/releases/LCG_97a/ROOT/v6.20.06/x86_64-centos7-gcc8-opt/lib/libTree.so
#9  0x00007f2a2709993c in TChain::LoadTree(long long) () from /cvmfs/sft.cern.ch/lcg/releases/LCG_97a/ROOT/v6.20.06/x86_64-centos7-gcc8-opt/lib/libTree.so
#10 0x00007f2a2709708a in TChain::GetListOfBranches() () from /cvmfs/sft.cern.ch/lcg/releases/LCG_97a/ROOT/v6.20.06/x86_64-centos7-gcc8-opt/lib/libTree.so
#11 0x00007f2a0fb3b9a0 in GetBranchNamesImpl(TTree&, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, std::set<TTree*, std::less<TTree*>, std::allocator<TTree*> >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, bool) [clone .localalias.509] () from /cvmfs/sft.cern.ch/lcg/releases/ROOT/v6.20.06-3f7fd/x86_64-centos7-gcc8-opt/lib/libROOTDataFrame.so
#12 0x00007f2a0fb3cc3b in ROOT::Internal::RDF::GetBranchNames[abi:cxx11](TTree&, bool) () from /cvmfs/sft.cern.ch/lcg/releases/ROOT/v6.20.06-3f7fd/x86_64-centos7-gcc8-opt/lib/libROOTDataFrame.so
#13 0x00007f2a2924a129 in ?? ()
#14 0x0000000000000000 in ?? ()
===========================================================


The lines below might hint at the cause of the crash.
You may get help by asking at the ROOT forum http://root.cern.ch/forum
Only if you are really convinced it is a bug in ROOT then please submit a
report at http://root.cern.ch/bugs Please post the ENTIRE stack trace
from above as an attachment in addition to anything else
that might help us fixing this issue.
===========================================================
#5  0x00007f2a25c23b5a in TClass::GetCollectionProxy() const () from /cvmfs/sft.cern.ch/lcg/releases/LCG_97a/ROOT/v6.20.06/x86_64-centos7-gcc8-opt/lib/libCore.so
#6  0x00007f2a270804f8 in TBranchElement::SetReadLeavesPtr() () from /cvmfs/sft.cern.ch/lcg/releases/LCG_97a/ROOT/v6.20.06/x86_64-centos7-gcc8-opt/lib/libTree.so
#7  0x00007f2a270812a0 in TBranchElement::SetMakeClass(bool) () from /cvmfs/sft.cern.ch/lcg/releases/LCG_97a/ROOT/v6.20.06/x86_64-centos7-gcc8-opt/lib/libTree.so
#8  0x00007f2a270cdb56 in TTree::SetMakeClass(int) [clone .localalias.305] () from /cvmfs/sft.cern.ch/lcg/releases/LCG_97a/ROOT/v6.20.06/x86_64-centos7-gcc8-opt/lib/libTree.so
#9  0x00007f2a2709993c in TChain::LoadTree(long long) () from /cvmfs/sft.cern.ch/lcg/releases/LCG_97a/ROOT/v6.20.06/x86_64-centos7-gcc8-opt/lib/libTree.so
#10 0x00007f2a2709708a in TChain::GetListOfBranches() () from /cvmfs/sft.cern.ch/lcg/releases/LCG_97a/ROOT/v6.20.06/x86_64-centos7-gcc8-opt/lib/libTree.so
#11 0x00007f2a0fb3b9a0 in GetBranchNamesImpl(TTree&, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, std::set<TTree*, std::less<TTree*>, std::allocator<TTree*> >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, bool) [clone .localalias.509] () from /cvmfs/sft.cern.ch/lcg/releases/ROOT/v6.20.06-3f7fd/x86_64-centos7-gcc8-opt/lib/libROOTDataFrame.so
#12 0x00007f2a0fb3cc3b in ROOT::Internal::RDF::GetBranchNames[abi:cxx11](TTree&, bool) () from /cvmfs/sft.cern.ch/lcg/releases/ROOT/v6.20.06-3f7fd/x86_64-centos7-gcc8-opt/lib/libROOTDataFrame.so
#13 0x00007f2a2924a129 in ?? ()
#14 0x0000000000000000 in ?? ()
===========================================================```

Hi @jcatmore ,
sorry about that! The stacktrace suggests that this is not an RDF problem but it should be enough to create a TChain with the same data and call GetListOfBranches() on it to reproduce the crash. Is that the case? @pcanal have you ever seen this with a xAOD file?

We might need to take a look at the file to debug further (EDIT: nevermind, I see the cernbox link!)

Cheers,
Enrico

P.S.
unrelated to the original post, but might be of interest: at some point a xAOD datasource for RDataFrame existed that could read most xAOD objects correctly, @Attila_Krasznahorkay should know what happened to it :slight_smile:

Hi Enrico,

thanks for the ultra-fast reply… actually I’d tried that before (should have mentioned it)… This code:

import ROOT

file = ROOT.TFile("DAOD_PHYSVAL.physval.pool.root","r")
tree = file.Get("CollectionTree")
list = tree.GetListOfBranches()
for item in list:
   print(item)

yields the following:

Name: EventInfoAux. Title: EventInfoAux.
Name: xTrigDecisionAux. Title: xTrigDecisionAux.
Name: TrigNavigationAux. Title: TrigNavigationAux.
Name: METAssoc_AntiKt4EMPFlowAux. Title: METAssoc_AntiKt4EMPFlowAux.
Name: METAssoc_AntiKt4EMTopoAux. Title: METAssoc_AntiKt4EMTopoAux.
Name: Kt4EMPFlowEventShapeAux. Title: Kt4EMPFlowEventShapeAux.
Name: Kt4EMTopoOriginEventShapeAux. Title: Kt4EMTopoOriginEventShapeAux.
Name: NeutralParticleFlowIsoCentralEventShapeAux. Title: NeutralParticleFlowIsoCentralEventShapeAux.
Name: NeutralParticleFlowIsoForwardEventShapeAux. Title: NeutralParticleFlowIsoForwardEventShapeAux.
Name: TopoClusterIsoCentralEventShapeAux. Title: TopoClusterIsoCentralEventShapeAux.
Name: TopoClusterIsoForwardEventShapeAux. Title: TopoClusterIsoForwardEventShapeAux.
Name: LVL1JetEtRoIAux. Title: LVL1JetEtRoIAux.
...
...
...

and doesn’t crash. So calling it directly from the TTree seems to be OK…

Cheers,

James.

I see a problem but it’s because of GetColumnType, not GetColumnNames.
In particular, dataframe.GetColumnType("HLT_xAOD__TrigMissingETContainer_TrigEFMissingET") causes a segfault (because TLeaf::GetTypeName returns a nullptr in that case and RDF does not check for it…).

If you remove the GetColumnType that you call in the loop (or if you only call it for columns that you can correctly read without dictionaries) does this remove the crash?

Hi @eguiraud,

yes, that worked. In fact, since I know already that I can only read dynamic variables (“AuxDyn”), and this is in the name (rather than the type), I can filter earlier and easily avoid any such crashes. There seems to be a small number of these types where GetTypeName return nullptr and cause the seg fault. I guess some check to gracefully bail out in such cases with a warning would be a nice development in the future, but for my purposes, this simple check is fine and I can carry on.

Thanks for the help!

James.

Good!

Absolutely! [DF] Avoid potential nullptr dereference by eguiraud · Pull Request #8286 · root-project/root · GitHub :grinning_face_with_smiling_eyes:

Thank you for reporting the problem!
Cheers,
Enrico

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.