Hi,
I’m continuing my stress testing of RDataFrame
, and now I’m trying to understand a weird dead-lock that my test application has a high chance of falling into.
The code in question is here:
https://gitlab.cern.ch/akraszna/xAODDataSource/blob/master/xAODDataFrameTests/util/rdfToolTest.cxx
Unfortunately it’s not the absolutely simplest thing.
Since I know that many of our tools can’t handle being initialised at once, I just taught my code to only allow initialising one analysis tool at a time. This is done by simple std::mutex
objects. Now, this works well most of the time.
But sometimes the application goes into a dead-lock. Pointing at the std::lock_guard
that I use for initialising the electron calibration tool in this example. Looking more carefully at the state of the application when the lock happens, I get:
deadlock.txt (38.1 KB)
It mostly looks understandable, apart from thread 4.
Thread 4 (Thread 0x7f97d9d4a700 (LWP 13753)):
#0 0x000000327a40e334 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x000000327a4095d8 in _L_lock_854 () from /lib64/libpthread.so.0
#2 0x000000327a4094a7 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x000000000041086a in __gthread_mutex_lock(pthread_mutex_t*) () at /cvmfs/sft.cern.ch/lcg/releases/gcc/6.2.0-2bc78/x86_64-slc6-gcc62-opt/include/c++/6.2.0/x86_64-pc-linux-gnu/bits/gthr-default.h:748
#4 0x0000000000417ca0 in std::mutex::lock() () at /cvmfs/sft.cern.ch/lcg/releases/gcc/6.2.0-2bc78/x86_64-slc6-gcc62-opt/include/c++/6.2.0/bits/std_mutex.h:103
#5 0x000000000041c6b8 in std::lock_guard<std::mutex>::lock_guard(std::mutex&) () at /cvmfs/sft.cern.ch/lcg/releases/gcc/6.2.0-2bc78/x86_64-slc6-gcc62-opt/include/c++/6.2.0/bits/std_mutex.h:162
#6 0x000000000041a3c3 in ElectronCalib::operator()(unsigned int, xAOD::Electron_v1*) () at /home/krasznaa/projects/xaodds/xAODDataSource/xAODDataFrameTests/util/rdfToolTest.cxx:179
#7 0x000000000044639d in std::_Function_handler<void ()(unsigned int, xAOD::Electron_v1*), ElectronCalib>::_M_invoke(std::_Any_data const&, unsigned int&&, xAOD::Electron_v1*&&) () at /cvmfs/sft.cern.ch/lcg/releases/gcc/6.2.0-2bc78/x86_64-slc6-gcc62-opt/include/c++/6.2.0/functional:1740
#8 0x000000000048c741 in std::function<void ()(unsigned int, xAOD::Electron_v1*)>::operator()(unsigned int, xAOD::Electron_v1*) const () at /cvmfs/sft.cern.ch/lcg/releases/gcc/6.2.0-2bc78/x86_64-slc6-gcc62-opt/include/c++/6.2.0/functional:2136
#9 0x0000000000486d8b in ShallowModify<DataVector<xAOD::Electron_v1, DataVector<xAOD::Egamma_v1, DataVector<xAOD::IParticle, DataModel_detail::NoBase> > > >::operator()(unsigned int, DataVector<xAOD::Electron_v1, DataVector<xAOD::Egamma_v1, DataVector<xAOD::IParticle, DataModel_detail::NoBase> > > const&, int) const () at /home/krasznaa/projects/xaodds/xAODDataSource/xAODDataFrameTests/util/rdfToolTest.cxx:69
#10 0x0000000000482cb8 in _ZN4ROOT6Detail3RDF13RCustomColumnI13ShallowModifyI10DataVectorIN4xAOD11Electron_v1ES4_INS5_9Egamma_v1ES4_INS5_9IParticleEN16DataModel_detail6NoBaseEEEEENS1_14TCCHelperTypes5TSlotEE12UpdateHelperIJLm0ELm1EEJSD_iEEEvjxSt16integer_sequenceImJXspT_EEENS_10TypeTraits8TypeListIJDpT0_EEEPSG_ () at /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/include/ROOT/RDFNodes.hxx:543
#11 0x00000000004803cb in ROOT::Detail::RDF::RCustomColumn<ShallowModify<DataVector<xAOD::Electron_v1, DataVector<xAOD::Egamma_v1, DataVector<xAOD::IParticle, DataModel_detail::NoBase> > > >, ROOT::Detail::RDF::TCCHelperTypes::TSlot>::Update(unsigned int, long long) () at /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/include/ROOT/RDFNodes.hxx:519
#12 0x000000000048539b in DataVector<xAOD::Electron_v1, DataVector<xAOD::Egamma_v1, DataVector<xAOD::IParticle, DataModel_detail::NoBase> > >& ROOT::Internal::RDF::TColumnValue<DataVector<xAOD::Electron_v1, DataVector<xAOD::Egamma_v1, DataVector<xAOD::IParticle, DataModel_detail::NoBase> > >, false>::Get<DataVector<xAOD::Electron_v1, DataVector<xAOD::Egamma_v1, DataVector<xAOD::IParticle, DataModel_detail::NoBase> > >, 0>(long long) () at /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/include/ROOT/RDFNodes.hxx:857
#13 0x0000000000416239 in _ZN4ROOT6Detail3RDF13RCustomColumnIZ4mainEUlRK10DataVectorIN4xAOD11Electron_v1ES3_INS4_9Egamma_v1ES3_INS4_9IParticleEN16DataModel_detail6NoBaseEEEEE_NS1_14TCCHelperTypes8TNothingEE12UpdateHelperIJLm0EEJSC_EEEvjxSt16integer_sequenceImJXspT_EEENS_10TypeTraits8TypeListIJDpT0_EEEPSH_ () at /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/include/ROOT/RDFNodes.hxx:533
#14 0x0000000000416017 in ROOT::Detail::RDF::RCustomColumn<main::{lambda(DataVector<xAOD::Electron_v1, DataVector<xAOD::Egamma_v1, DataVector<xAOD::IParticle, DataModel_detail::NoBase> > > const&)#1}, ROOT::Detail::RDF::TCCHelperTypes::TNothing>::Update(unsigned int, long long) () at /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/include/ROOT/RDFNodes.hxx:519
#15 0x00007f98163a663c in ?? ()
#16 0x00007f97c803dd48 in ?? ()
#17 0x000000001d4a5a00 in ?? ()
#18 0x000000001d4a5a00 in ?? ()
#19 0x00000000000006a4 in ?? ()
#20 0x000000001d4a5a00 in ?? ()
#21 0x000000001d4a5a00 in ?? ()
#22 0x00007f97d9d42eb0 in ?? ()
#23 0x00007f98163a7f06 in ?? ()
#24 0x00000004d9d42eb0 in ?? ()
#25 0x000000001d49ee58 in ?? ()
#26 0x00000000000006a4 in ?? ()
#27 0x0000000406e5ea70 in ?? ()
#28 0x000000001d49ee40 in ?? ()
#29 0x0000000006e5ea70 in ?? ()
#30 0x00007f97d9d42ef0 in ?? ()
#31 0x00007f98163a7e93 in ?? ()
#32 0x00007f97d9d42f50 in ?? ()
#33 0x000000001d49ee40 in ?? ()
#34 0x0000000000000004 in ?? ()
#35 0x00000000000006a4 in ?? ()
#36 0x00000004d9d42f00 in ?? ()
#37 0x000000001d49ee40 in ?? ()
#38 0x0000000000000004 in ?? ()
#39 0x00007f98285b9d5a in ROOT::Detail::RDF::RLoopManager::RunAndCheckFilters(unsigned int, long long) () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libROOTDataFrame.so
#40 0x00007f98285bac9f in std::_Function_handler<void ()(unsigned int), void ROOT::TThreadExecutor::Foreach<ROOT::Detail::RDF::RLoopManager::RunDataSourceMT()::{lambda(std::pair<unsigned long long, unsigned long long> const&)#1}, std::pair<unsigned long long, unsigned long long> >(ROOT::Detail::RDF::RLoopManager::RunDataSourceMT()::{lambda(std::pair<unsigned long long, unsigned long long> const&)#1}, std::vector<std::pair<unsigned long long, unsigned long long>, std::allocator<std::vector> >&)::{lambda(unsigned int)#1}>::_M_invoke(std::_Any_data const&, unsigned int&&) () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libROOTDataFrame.so
#41 0x00007f9825c7e69a in tbb::interface9::internal::start_for<tbb::blocked_range<unsigned int>, tbb::internal::parallel_for_body<std::function<void ()(unsigned int)>, unsigned int>, tbb::auto_partitioner const>::execute() () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libImt.so
#42 0x00007f98248249a3 in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all(tbb::task&, tbb::task*) () at ../../src/tbb/custom_scheduler.h:501
#43 0x00007f9824821770 in tbb::internal::generic_scheduler::local_spawn_root_and_wait(tbb::task&, tbb::task*&) () at ../../src/tbb/scheduler.cpp:676
#44 0x00007f9825c7dff9 in ROOT::TThreadExecutor::ParallelFor(unsigned int, unsigned int, unsigned int, std::function<void ()(unsigned int)> const&) () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libImt.so
#45 0x00007f9824b41b6d in TTree::GetEntry(long long, int) () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libTree.so
#46 0x00007f981613388f in MVAUtils::BDT::BDT(TTree*) () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBase/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libMVAUtils.so
#47 0x00007f981617e869 in egammaMVACalib::setupBDT(TString const&) () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBase/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libegammaMVACalibLib.so
#48 0x00007f981617f8f9 in egammaMVACalib::getBDTs(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBase/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libegammaMVACalibLib.so
#49 0x00007f9816180233 in egammaMVACalib::egammaMVACalib(int, bool, TString, TString const&, int, bool, TString const&, TString const&, TString const&, TString, bool) () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBase/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libegammaMVACalibLib.so
#50 0x00007f981618832b in egammaMVATool::initialize() () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBase/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libegammaMVACalibLib.so
#51 0x00007f97d91e7921 in CP::EgammaCalibrationAndSmearingTool::initialize() () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBase/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libElectronPhotonFourMomentumCorrectionLib.so
#52 0x00007f98284619b3 in asg::detail::AnaToolConfig::makeToolRootCore(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, asg::IAsgTool*&, asg::detail::AnaToolCleanup&) const () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBase/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libAsgTools.so
#53 0x00007f9828461bd7 in asg::detail::AnaToolConfig::makeBaseTool(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, INamedInterface*, ToolHandle<asg::IAsgTool>&, asg::detail::AnaToolCleanup&) const () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBase/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libAsgTools.so
#54 0x00007f9828474df0 in StatusCode asg::detail::AnaToolConfig::makeTool<asg::IAsgTool>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, INamedInterface*, ToolHandle<asg::IAsgTool>&, asg::detail::AnaToolCleanup&) const () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBase/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libAsgTools.so
#55 0x00007f9828462028 in asg::detail::AnaToolShareList::makeShare(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, asg::detail::AnaToolConfig const&, std::shared_ptr<asg::detail::AnaToolShare>&) () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBase/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libAsgTools.so
#56 0x0000000000426626 in asg::AnaToolHandle<CP::IEgammaCalibrationAndSmearingTool>::initialize() () at /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBase/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/src/Control/AthToolSupport/AsgTools/AsgTools/AnaToolHandle.icc:822
#57 0x000000000041a3d6 in ElectronCalib::operator()(unsigned int, xAOD::Electron_v1*) () at /home/krasznaa/projects/xaodds/xAODDataSource/xAODDataFrameTests/util/rdfToolTest.cxx:180
#58 0x000000000044639d in std::_Function_handler<void ()(unsigned int, xAOD::Electron_v1*), ElectronCalib>::_M_invoke(std::_Any_data const&, unsigned int&&, xAOD::Electron_v1*&&) () at /cvmfs/sft.cern.ch/lcg/releases/gcc/6.2.0-2bc78/x86_64-slc6-gcc62-opt/include/c++/6.2.0/functional:1740
#59 0x000000000048c741 in std::function<void ()(unsigned int, xAOD::Electron_v1*)>::operator()(unsigned int, xAOD::Electron_v1*) const () at /cvmfs/sft.cern.ch/lcg/releases/gcc/6.2.0-2bc78/x86_64-slc6-gcc62-opt/include/c++/6.2.0/functional:2136
#60 0x0000000000486d8b in ShallowModify<DataVector<xAOD::Electron_v1, DataVector<xAOD::Egamma_v1, DataVector<xAOD::IParticle, DataModel_detail::NoBase> > > >::operator()(unsigned int, DataVector<xAOD::Electron_v1, DataVector<xAOD::Egamma_v1, DataVector<xAOD::IParticle, DataModel_detail::NoBase> > > const&, int) const () at /home/krasznaa/projects/xaodds/xAODDataSource/xAODDataFrameTests/util/rdfToolTest.cxx:69
#61 0x0000000000482cb8 in _ZN4ROOT6Detail3RDF13RCustomColumnI13ShallowModifyI10DataVectorIN4xAOD11Electron_v1ES4_INS5_9Egamma_v1ES4_INS5_9IParticleEN16DataModel_detail6NoBaseEEEEENS1_14TCCHelperTypes5TSlotEE12UpdateHelperIJLm0ELm1EEJSD_iEEEvjxSt16integer_sequenceImJXspT_EEENS_10TypeTraits8TypeListIJDpT0_EEEPSG_ () at /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/include/ROOT/RDFNodes.hxx:543
#62 0x00000000004803cb in ROOT::Detail::RDF::RCustomColumn<ShallowModify<DataVector<xAOD::Electron_v1, DataVector<xAOD::Egamma_v1, DataVector<xAOD::IParticle, DataModel_detail::NoBase> > > >, ROOT::Detail::RDF::TCCHelperTypes::TSlot>::Update(unsigned int, long long) () at /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/include/ROOT/RDFNodes.hxx:519
#63 0x000000000048539b in DataVector<xAOD::Electron_v1, DataVector<xAOD::Egamma_v1, DataVector<xAOD::IParticle, DataModel_detail::NoBase> > >& ROOT::Internal::RDF::TColumnValue<DataVector<xAOD::Electron_v1, DataVector<xAOD::Egamma_v1, DataVector<xAOD::IParticle, DataModel_detail::NoBase> > >, false>::Get<DataVector<xAOD::Electron_v1, DataVector<xAOD::Egamma_v1, DataVector<xAOD::IParticle, DataModel_detail::NoBase> > >, 0>(long long) () at /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/include/ROOT/RDFNodes.hxx:857
#64 0x0000000000416239 in _ZN4ROOT6Detail3RDF13RCustomColumnIZ4mainEUlRK10DataVectorIN4xAOD11Electron_v1ES3_INS4_9Egamma_v1ES3_INS4_9IParticleEN16DataModel_detail6NoBaseEEEEE_NS1_14TCCHelperTypes8TNothingEE12UpdateHelperIJLm0EEJSC_EEEvjxSt16integer_sequenceImJXspT_EEENS_10TypeTraits8TypeListIJDpT0_EEEPSH_ () at /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/include/ROOT/RDFNodes.hxx:533
#65 0x0000000000416017 in ROOT::Detail::RDF::RCustomColumn<main::{lambda(DataVector<xAOD::Electron_v1, DataVector<xAOD::Egamma_v1, DataVector<xAOD::IParticle, DataModel_detail::NoBase> > > const&)#1}, ROOT::Detail::RDF::TCCHelperTypes::TNothing>::Update(unsigned int, long long) () at /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/include/ROOT/RDFNodes.hxx:519
It seems to me that the implicit MT code in TTree
reading “gets confused”. Since the tools themselves interact with a number of trees to initialise themselves. (These are all separate TTree
instances from the one handled by RDataFrame
of course.)
Unfortunately I haven’t been able to reproduce the issue in a small standalone application just yet. But could it be that the multi-threaded TTree
reading code could get confused in such a setup?
Cheers,
Attila
ROOT Version: 6.14/04
Platform: x86_64-slc6-gcc62-opt and x86_64-mac1014-clang100-opt
Compiler: GCC 6.2 and Apple Clang 10.0