Hi,
I’m continuing my stress testing of RDataFrame, and now I’m trying to understand a weird dead-lock that my test application has a high chance of falling into.
The code in question is here:
https://gitlab.cern.ch/akraszna/xAODDataSource/blob/master/xAODDataFrameTests/util/rdfToolTest.cxx
Unfortunately it’s not the absolutely simplest thing. 
Since I know that many of our tools can’t handle being initialised at once, I just taught my code to only allow initialising one analysis tool at a time. This is done by simple std::mutex objects. Now, this works well most of the time.
But sometimes the application goes into a dead-lock. Pointing at the std::lock_guard that I use for initialising the electron calibration tool in this example. Looking more carefully at the state of the application when the lock happens, I get:
deadlock.txt (38.1 KB)
It mostly looks understandable, apart from thread 4.
Thread 4 (Thread 0x7f97d9d4a700 (LWP 13753)):
#0 0x000000327a40e334 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x000000327a4095d8 in _L_lock_854 () from /lib64/libpthread.so.0
#2 0x000000327a4094a7 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x000000000041086a in __gthread_mutex_lock(pthread_mutex_t*) () at /cvmfs/sft.cern.ch/lcg/releases/gcc/6.2.0-2bc78/x86_64-slc6-gcc62-opt/include/c++/6.2.0/x86_64-pc-linux-gnu/bits/gthr-default.h:748
#4 0x0000000000417ca0 in std::mutex::lock() () at /cvmfs/sft.cern.ch/lcg/releases/gcc/6.2.0-2bc78/x86_64-slc6-gcc62-opt/include/c++/6.2.0/bits/std_mutex.h:103
#5 0x000000000041c6b8 in std::lock_guard<std::mutex>::lock_guard(std::mutex&) () at /cvmfs/sft.cern.ch/lcg/releases/gcc/6.2.0-2bc78/x86_64-slc6-gcc62-opt/include/c++/6.2.0/bits/std_mutex.h:162
#6 0x000000000041a3c3 in ElectronCalib::operator()(unsigned int, xAOD::Electron_v1*) () at /home/krasznaa/projects/xaodds/xAODDataSource/xAODDataFrameTests/util/rdfToolTest.cxx:179
#7 0x000000000044639d in std::_Function_handler<void ()(unsigned int, xAOD::Electron_v1*), ElectronCalib>::_M_invoke(std::_Any_data const&, unsigned int&&, xAOD::Electron_v1*&&) () at /cvmfs/sft.cern.ch/lcg/releases/gcc/6.2.0-2bc78/x86_64-slc6-gcc62-opt/include/c++/6.2.0/functional:1740
#8 0x000000000048c741 in std::function<void ()(unsigned int, xAOD::Electron_v1*)>::operator()(unsigned int, xAOD::Electron_v1*) const () at /cvmfs/sft.cern.ch/lcg/releases/gcc/6.2.0-2bc78/x86_64-slc6-gcc62-opt/include/c++/6.2.0/functional:2136
#9 0x0000000000486d8b in ShallowModify<DataVector<xAOD::Electron_v1, DataVector<xAOD::Egamma_v1, DataVector<xAOD::IParticle, DataModel_detail::NoBase> > > >::operator()(unsigned int, DataVector<xAOD::Electron_v1, DataVector<xAOD::Egamma_v1, DataVector<xAOD::IParticle, DataModel_detail::NoBase> > > const&, int) const () at /home/krasznaa/projects/xaodds/xAODDataSource/xAODDataFrameTests/util/rdfToolTest.cxx:69
#10 0x0000000000482cb8 in _ZN4ROOT6Detail3RDF13RCustomColumnI13ShallowModifyI10DataVectorIN4xAOD11Electron_v1ES4_INS5_9Egamma_v1ES4_INS5_9IParticleEN16DataModel_detail6NoBaseEEEEENS1_14TCCHelperTypes5TSlotEE12UpdateHelperIJLm0ELm1EEJSD_iEEEvjxSt16integer_sequenceImJXspT_EEENS_10TypeTraits8TypeListIJDpT0_EEEPSG_ () at /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/include/ROOT/RDFNodes.hxx:543
#11 0x00000000004803cb in ROOT::Detail::RDF::RCustomColumn<ShallowModify<DataVector<xAOD::Electron_v1, DataVector<xAOD::Egamma_v1, DataVector<xAOD::IParticle, DataModel_detail::NoBase> > > >, ROOT::Detail::RDF::TCCHelperTypes::TSlot>::Update(unsigned int, long long) () at /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/include/ROOT/RDFNodes.hxx:519
#12 0x000000000048539b in DataVector<xAOD::Electron_v1, DataVector<xAOD::Egamma_v1, DataVector<xAOD::IParticle, DataModel_detail::NoBase> > >& ROOT::Internal::RDF::TColumnValue<DataVector<xAOD::Electron_v1, DataVector<xAOD::Egamma_v1, DataVector<xAOD::IParticle, DataModel_detail::NoBase> > >, false>::Get<DataVector<xAOD::Electron_v1, DataVector<xAOD::Egamma_v1, DataVector<xAOD::IParticle, DataModel_detail::NoBase> > >, 0>(long long) () at /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/include/ROOT/RDFNodes.hxx:857
#13 0x0000000000416239 in _ZN4ROOT6Detail3RDF13RCustomColumnIZ4mainEUlRK10DataVectorIN4xAOD11Electron_v1ES3_INS4_9Egamma_v1ES3_INS4_9IParticleEN16DataModel_detail6NoBaseEEEEE_NS1_14TCCHelperTypes8TNothingEE12UpdateHelperIJLm0EEJSC_EEEvjxSt16integer_sequenceImJXspT_EEENS_10TypeTraits8TypeListIJDpT0_EEEPSH_ () at /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/include/ROOT/RDFNodes.hxx:533
#14 0x0000000000416017 in ROOT::Detail::RDF::RCustomColumn<main::{lambda(DataVector<xAOD::Electron_v1, DataVector<xAOD::Egamma_v1, DataVector<xAOD::IParticle, DataModel_detail::NoBase> > > const&)#1}, ROOT::Detail::RDF::TCCHelperTypes::TNothing>::Update(unsigned int, long long) () at /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/include/ROOT/RDFNodes.hxx:519
#15 0x00007f98163a663c in ?? ()
#16 0x00007f97c803dd48 in ?? ()
#17 0x000000001d4a5a00 in ?? ()
#18 0x000000001d4a5a00 in ?? ()
#19 0x00000000000006a4 in ?? ()
#20 0x000000001d4a5a00 in ?? ()
#21 0x000000001d4a5a00 in ?? ()
#22 0x00007f97d9d42eb0 in ?? ()
#23 0x00007f98163a7f06 in ?? ()
#24 0x00000004d9d42eb0 in ?? ()
#25 0x000000001d49ee58 in ?? ()
#26 0x00000000000006a4 in ?? ()
#27 0x0000000406e5ea70 in ?? ()
#28 0x000000001d49ee40 in ?? ()
#29 0x0000000006e5ea70 in ?? ()
#30 0x00007f97d9d42ef0 in ?? ()
#31 0x00007f98163a7e93 in ?? ()
#32 0x00007f97d9d42f50 in ?? ()
#33 0x000000001d49ee40 in ?? ()
#34 0x0000000000000004 in ?? ()
#35 0x00000000000006a4 in ?? ()
#36 0x00000004d9d42f00 in ?? ()
#37 0x000000001d49ee40 in ?? ()
#38 0x0000000000000004 in ?? ()
#39 0x00007f98285b9d5a in ROOT::Detail::RDF::RLoopManager::RunAndCheckFilters(unsigned int, long long) () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libROOTDataFrame.so
#40 0x00007f98285bac9f in std::_Function_handler<void ()(unsigned int), void ROOT::TThreadExecutor::Foreach<ROOT::Detail::RDF::RLoopManager::RunDataSourceMT()::{lambda(std::pair<unsigned long long, unsigned long long> const&)#1}, std::pair<unsigned long long, unsigned long long> >(ROOT::Detail::RDF::RLoopManager::RunDataSourceMT()::{lambda(std::pair<unsigned long long, unsigned long long> const&)#1}, std::vector<std::pair<unsigned long long, unsigned long long>, std::allocator<std::vector> >&)::{lambda(unsigned int)#1}>::_M_invoke(std::_Any_data const&, unsigned int&&) () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libROOTDataFrame.so
#41 0x00007f9825c7e69a in tbb::interface9::internal::start_for<tbb::blocked_range<unsigned int>, tbb::internal::parallel_for_body<std::function<void ()(unsigned int)>, unsigned int>, tbb::auto_partitioner const>::execute() () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libImt.so
#42 0x00007f98248249a3 in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all(tbb::task&, tbb::task*) () at ../../src/tbb/custom_scheduler.h:501
#43 0x00007f9824821770 in tbb::internal::generic_scheduler::local_spawn_root_and_wait(tbb::task&, tbb::task*&) () at ../../src/tbb/scheduler.cpp:676
#44 0x00007f9825c7dff9 in ROOT::TThreadExecutor::ParallelFor(unsigned int, unsigned int, unsigned int, std::function<void ()(unsigned int)> const&) () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libImt.so
#45 0x00007f9824b41b6d in TTree::GetEntry(long long, int) () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libTree.so
#46 0x00007f981613388f in MVAUtils::BDT::BDT(TTree*) () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBase/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libMVAUtils.so
#47 0x00007f981617e869 in egammaMVACalib::setupBDT(TString const&) () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBase/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libegammaMVACalibLib.so
#48 0x00007f981617f8f9 in egammaMVACalib::getBDTs(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBase/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libegammaMVACalibLib.so
#49 0x00007f9816180233 in egammaMVACalib::egammaMVACalib(int, bool, TString, TString const&, int, bool, TString const&, TString const&, TString const&, TString, bool) () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBase/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libegammaMVACalibLib.so
#50 0x00007f981618832b in egammaMVATool::initialize() () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBase/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libegammaMVACalibLib.so
#51 0x00007f97d91e7921 in CP::EgammaCalibrationAndSmearingTool::initialize() () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBase/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libElectronPhotonFourMomentumCorrectionLib.so
#52 0x00007f98284619b3 in asg::detail::AnaToolConfig::makeToolRootCore(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, asg::IAsgTool*&, asg::detail::AnaToolCleanup&) const () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBase/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libAsgTools.so
#53 0x00007f9828461bd7 in asg::detail::AnaToolConfig::makeBaseTool(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, INamedInterface*, ToolHandle<asg::IAsgTool>&, asg::detail::AnaToolCleanup&) const () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBase/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libAsgTools.so
#54 0x00007f9828474df0 in StatusCode asg::detail::AnaToolConfig::makeTool<asg::IAsgTool>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, INamedInterface*, ToolHandle<asg::IAsgTool>&, asg::detail::AnaToolCleanup&) const () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBase/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libAsgTools.so
#55 0x00007f9828462028 in asg::detail::AnaToolShareList::makeShare(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, asg::detail::AnaToolConfig const&, std::shared_ptr<asg::detail::AnaToolShare>&) () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBase/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libAsgTools.so
#56 0x0000000000426626 in asg::AnaToolHandle<CP::IEgammaCalibrationAndSmearingTool>::initialize() () at /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBase/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/src/Control/AthToolSupport/AsgTools/AsgTools/AnaToolHandle.icc:822
#57 0x000000000041a3d6 in ElectronCalib::operator()(unsigned int, xAOD::Electron_v1*) () at /home/krasznaa/projects/xaodds/xAODDataSource/xAODDataFrameTests/util/rdfToolTest.cxx:180
#58 0x000000000044639d in std::_Function_handler<void ()(unsigned int, xAOD::Electron_v1*), ElectronCalib>::_M_invoke(std::_Any_data const&, unsigned int&&, xAOD::Electron_v1*&&) () at /cvmfs/sft.cern.ch/lcg/releases/gcc/6.2.0-2bc78/x86_64-slc6-gcc62-opt/include/c++/6.2.0/functional:1740
#59 0x000000000048c741 in std::function<void ()(unsigned int, xAOD::Electron_v1*)>::operator()(unsigned int, xAOD::Electron_v1*) const () at /cvmfs/sft.cern.ch/lcg/releases/gcc/6.2.0-2bc78/x86_64-slc6-gcc62-opt/include/c++/6.2.0/functional:2136
#60 0x0000000000486d8b in ShallowModify<DataVector<xAOD::Electron_v1, DataVector<xAOD::Egamma_v1, DataVector<xAOD::IParticle, DataModel_detail::NoBase> > > >::operator()(unsigned int, DataVector<xAOD::Electron_v1, DataVector<xAOD::Egamma_v1, DataVector<xAOD::IParticle, DataModel_detail::NoBase> > > const&, int) const () at /home/krasznaa/projects/xaodds/xAODDataSource/xAODDataFrameTests/util/rdfToolTest.cxx:69
#61 0x0000000000482cb8 in _ZN4ROOT6Detail3RDF13RCustomColumnI13ShallowModifyI10DataVectorIN4xAOD11Electron_v1ES4_INS5_9Egamma_v1ES4_INS5_9IParticleEN16DataModel_detail6NoBaseEEEEENS1_14TCCHelperTypes5TSlotEE12UpdateHelperIJLm0ELm1EEJSD_iEEEvjxSt16integer_sequenceImJXspT_EEENS_10TypeTraits8TypeListIJDpT0_EEEPSG_ () at /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/include/ROOT/RDFNodes.hxx:543
#62 0x00000000004803cb in ROOT::Detail::RDF::RCustomColumn<ShallowModify<DataVector<xAOD::Electron_v1, DataVector<xAOD::Egamma_v1, DataVector<xAOD::IParticle, DataModel_detail::NoBase> > > >, ROOT::Detail::RDF::TCCHelperTypes::TSlot>::Update(unsigned int, long long) () at /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/include/ROOT/RDFNodes.hxx:519
#63 0x000000000048539b in DataVector<xAOD::Electron_v1, DataVector<xAOD::Egamma_v1, DataVector<xAOD::IParticle, DataModel_detail::NoBase> > >& ROOT::Internal::RDF::TColumnValue<DataVector<xAOD::Electron_v1, DataVector<xAOD::Egamma_v1, DataVector<xAOD::IParticle, DataModel_detail::NoBase> > >, false>::Get<DataVector<xAOD::Electron_v1, DataVector<xAOD::Egamma_v1, DataVector<xAOD::IParticle, DataModel_detail::NoBase> > >, 0>(long long) () at /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/include/ROOT/RDFNodes.hxx:857
#64 0x0000000000416239 in _ZN4ROOT6Detail3RDF13RCustomColumnIZ4mainEUlRK10DataVectorIN4xAOD11Electron_v1ES3_INS4_9Egamma_v1ES3_INS4_9IParticleEN16DataModel_detail6NoBaseEEEEE_NS1_14TCCHelperTypes8TNothingEE12UpdateHelperIJLm0EEJSC_EEEvjxSt16integer_sequenceImJXspT_EEENS_10TypeTraits8TypeListIJDpT0_EEEPSH_ () at /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/include/ROOT/RDFNodes.hxx:533
#65 0x0000000000416017 in ROOT::Detail::RDF::RCustomColumn<main::{lambda(DataVector<xAOD::Electron_v1, DataVector<xAOD::Egamma_v1, DataVector<xAOD::IParticle, DataModel_detail::NoBase> > > const&)#1}, ROOT::Detail::RDF::TCCHelperTypes::TNothing>::Update(unsigned int, long long) () at /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/include/ROOT/RDFNodes.hxx:519
It seems to me that the implicit MT code in TTree reading “gets confused”. Since the tools themselves interact with a number of trees to initialise themselves. (These are all separate TTree instances from the one handled by RDataFrame of course.)
Unfortunately I haven’t been able to reproduce the issue in a small standalone application just yet. But could it be that the multi-threaded TTree reading code could get confused in such a setup? 
Cheers,
Attila
ROOT Version: 6.14/04
Platform: x86_64-slc6-gcc62-opt and x86_64-mac1014-clang100-opt
Compiler: GCC 6.2 and Apple Clang 10.0

All analysis tools have access to the current event “behind the scenes”. So that if let’s say you ask for an electron to be calibrated, the tool could go and ask the “event info” object which run we are currently in. Because we didn’t want to spell out such data requirements in our tool interfaces.
Shallow copies in our EDM are objects/containers that are only functional “on top of” an original object/container. As they only hold variables that have been modified. For any unmodified variable they go back to the original object/container.
This is unfortunate. I would really like to make RDF work for applying systematic variations…