Implicit multi-threading issue

Hi,

I’m continuing my stress testing of RDataFrame, and now I’m trying to understand a weird dead-lock that my test application has a high chance of falling into.

The code in question is here:

https://gitlab.cern.ch/akraszna/xAODDataSource/blob/master/xAODDataFrameTests/util/rdfToolTest.cxx

Unfortunately it’s not the absolutely simplest thing. :frowning:

Since I know that many of our tools can’t handle being initialised at once, I just taught my code to only allow initialising one analysis tool at a time. This is done by simple std::mutex objects. Now, this works well most of the time.

But sometimes the application goes into a dead-lock. Pointing at the std::lock_guard that I use for initialising the electron calibration tool in this example. Looking more carefully at the state of the application when the lock happens, I get:

deadlock.txt (38.1 KB)

It mostly looks understandable, apart from thread 4.

Thread 4 (Thread 0x7f97d9d4a700 (LWP 13753)):
#0  0x000000327a40e334 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x000000327a4095d8 in _L_lock_854 () from /lib64/libpthread.so.0
#2  0x000000327a4094a7 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x000000000041086a in __gthread_mutex_lock(pthread_mutex_t*) () at /cvmfs/sft.cern.ch/lcg/releases/gcc/6.2.0-2bc78/x86_64-slc6-gcc62-opt/include/c++/6.2.0/x86_64-pc-linux-gnu/bits/gthr-default.h:748
#4  0x0000000000417ca0 in std::mutex::lock() () at /cvmfs/sft.cern.ch/lcg/releases/gcc/6.2.0-2bc78/x86_64-slc6-gcc62-opt/include/c++/6.2.0/bits/std_mutex.h:103
#5  0x000000000041c6b8 in std::lock_guard<std::mutex>::lock_guard(std::mutex&) () at /cvmfs/sft.cern.ch/lcg/releases/gcc/6.2.0-2bc78/x86_64-slc6-gcc62-opt/include/c++/6.2.0/bits/std_mutex.h:162
#6  0x000000000041a3c3 in ElectronCalib::operator()(unsigned int, xAOD::Electron_v1*) () at /home/krasznaa/projects/xaodds/xAODDataSource/xAODDataFrameTests/util/rdfToolTest.cxx:179
#7  0x000000000044639d in std::_Function_handler<void ()(unsigned int, xAOD::Electron_v1*), ElectronCalib>::_M_invoke(std::_Any_data const&, unsigned int&&, xAOD::Electron_v1*&&) () at /cvmfs/sft.cern.ch/lcg/releases/gcc/6.2.0-2bc78/x86_64-slc6-gcc62-opt/include/c++/6.2.0/functional:1740
#8  0x000000000048c741 in std::function<void ()(unsigned int, xAOD::Electron_v1*)>::operator()(unsigned int, xAOD::Electron_v1*) const () at /cvmfs/sft.cern.ch/lcg/releases/gcc/6.2.0-2bc78/x86_64-slc6-gcc62-opt/include/c++/6.2.0/functional:2136
#9  0x0000000000486d8b in ShallowModify<DataVector<xAOD::Electron_v1, DataVector<xAOD::Egamma_v1, DataVector<xAOD::IParticle, DataModel_detail::NoBase> > > >::operator()(unsigned int, DataVector<xAOD::Electron_v1, DataVector<xAOD::Egamma_v1, DataVector<xAOD::IParticle, DataModel_detail::NoBase> > > const&, int) const () at /home/krasznaa/projects/xaodds/xAODDataSource/xAODDataFrameTests/util/rdfToolTest.cxx:69
#10 0x0000000000482cb8 in _ZN4ROOT6Detail3RDF13RCustomColumnI13ShallowModifyI10DataVectorIN4xAOD11Electron_v1ES4_INS5_9Egamma_v1ES4_INS5_9IParticleEN16DataModel_detail6NoBaseEEEEENS1_14TCCHelperTypes5TSlotEE12UpdateHelperIJLm0ELm1EEJSD_iEEEvjxSt16integer_sequenceImJXspT_EEENS_10TypeTraits8TypeListIJDpT0_EEEPSG_ () at /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/include/ROOT/RDFNodes.hxx:543
#11 0x00000000004803cb in ROOT::Detail::RDF::RCustomColumn<ShallowModify<DataVector<xAOD::Electron_v1, DataVector<xAOD::Egamma_v1, DataVector<xAOD::IParticle, DataModel_detail::NoBase> > > >, ROOT::Detail::RDF::TCCHelperTypes::TSlot>::Update(unsigned int, long long) () at /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/include/ROOT/RDFNodes.hxx:519
#12 0x000000000048539b in DataVector<xAOD::Electron_v1, DataVector<xAOD::Egamma_v1, DataVector<xAOD::IParticle, DataModel_detail::NoBase> > >& ROOT::Internal::RDF::TColumnValue<DataVector<xAOD::Electron_v1, DataVector<xAOD::Egamma_v1, DataVector<xAOD::IParticle, DataModel_detail::NoBase> > >, false>::Get<DataVector<xAOD::Electron_v1, DataVector<xAOD::Egamma_v1, DataVector<xAOD::IParticle, DataModel_detail::NoBase> > >, 0>(long long) () at /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/include/ROOT/RDFNodes.hxx:857
#13 0x0000000000416239 in _ZN4ROOT6Detail3RDF13RCustomColumnIZ4mainEUlRK10DataVectorIN4xAOD11Electron_v1ES3_INS4_9Egamma_v1ES3_INS4_9IParticleEN16DataModel_detail6NoBaseEEEEE_NS1_14TCCHelperTypes8TNothingEE12UpdateHelperIJLm0EEJSC_EEEvjxSt16integer_sequenceImJXspT_EEENS_10TypeTraits8TypeListIJDpT0_EEEPSH_ () at /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/include/ROOT/RDFNodes.hxx:533
#14 0x0000000000416017 in ROOT::Detail::RDF::RCustomColumn<main::{lambda(DataVector<xAOD::Electron_v1, DataVector<xAOD::Egamma_v1, DataVector<xAOD::IParticle, DataModel_detail::NoBase> > > const&)#1}, ROOT::Detail::RDF::TCCHelperTypes::TNothing>::Update(unsigned int, long long) () at /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/include/ROOT/RDFNodes.hxx:519
#15 0x00007f98163a663c in ?? ()
#16 0x00007f97c803dd48 in ?? ()
#17 0x000000001d4a5a00 in ?? ()
#18 0x000000001d4a5a00 in ?? ()
#19 0x00000000000006a4 in ?? ()
#20 0x000000001d4a5a00 in ?? ()
#21 0x000000001d4a5a00 in ?? ()
#22 0x00007f97d9d42eb0 in ?? ()
#23 0x00007f98163a7f06 in ?? ()
#24 0x00000004d9d42eb0 in ?? ()
#25 0x000000001d49ee58 in ?? ()
#26 0x00000000000006a4 in ?? ()
#27 0x0000000406e5ea70 in ?? ()
#28 0x000000001d49ee40 in ?? ()
#29 0x0000000006e5ea70 in ?? ()
#30 0x00007f97d9d42ef0 in ?? ()
#31 0x00007f98163a7e93 in ?? ()
#32 0x00007f97d9d42f50 in ?? ()
#33 0x000000001d49ee40 in ?? ()
#34 0x0000000000000004 in ?? ()
#35 0x00000000000006a4 in ?? ()
#36 0x00000004d9d42f00 in ?? ()
#37 0x000000001d49ee40 in ?? ()
#38 0x0000000000000004 in ?? ()
#39 0x00007f98285b9d5a in ROOT::Detail::RDF::RLoopManager::RunAndCheckFilters(unsigned int, long long) () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libROOTDataFrame.so
#40 0x00007f98285bac9f in std::_Function_handler<void ()(unsigned int), void ROOT::TThreadExecutor::Foreach<ROOT::Detail::RDF::RLoopManager::RunDataSourceMT()::{lambda(std::pair<unsigned long long, unsigned long long> const&)#1}, std::pair<unsigned long long, unsigned long long> >(ROOT::Detail::RDF::RLoopManager::RunDataSourceMT()::{lambda(std::pair<unsigned long long, unsigned long long> const&)#1}, std::vector<std::pair<unsigned long long, unsigned long long>, std::allocator<std::vector> >&)::{lambda(unsigned int)#1}>::_M_invoke(std::_Any_data const&, unsigned int&&) () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libROOTDataFrame.so
#41 0x00007f9825c7e69a in tbb::interface9::internal::start_for<tbb::blocked_range<unsigned int>, tbb::internal::parallel_for_body<std::function<void ()(unsigned int)>, unsigned int>, tbb::auto_partitioner const>::execute() () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libImt.so
#42 0x00007f98248249a3 in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all(tbb::task&, tbb::task*) () at ../../src/tbb/custom_scheduler.h:501
#43 0x00007f9824821770 in tbb::internal::generic_scheduler::local_spawn_root_and_wait(tbb::task&, tbb::task*&) () at ../../src/tbb/scheduler.cpp:676
#44 0x00007f9825c7dff9 in ROOT::TThreadExecutor::ParallelFor(unsigned int, unsigned int, unsigned int, std::function<void ()(unsigned int)> const&) () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libImt.so
#45 0x00007f9824b41b6d in TTree::GetEntry(long long, int) () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libTree.so
#46 0x00007f981613388f in MVAUtils::BDT::BDT(TTree*) () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBase/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libMVAUtils.so
#47 0x00007f981617e869 in egammaMVACalib::setupBDT(TString const&) () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBase/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libegammaMVACalibLib.so
#48 0x00007f981617f8f9 in egammaMVACalib::getBDTs(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBase/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libegammaMVACalibLib.so
#49 0x00007f9816180233 in egammaMVACalib::egammaMVACalib(int, bool, TString, TString const&, int, bool, TString const&, TString const&, TString const&, TString, bool) () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBase/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libegammaMVACalibLib.so
#50 0x00007f981618832b in egammaMVATool::initialize() () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBase/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libegammaMVACalibLib.so
#51 0x00007f97d91e7921 in CP::EgammaCalibrationAndSmearingTool::initialize() () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBase/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libElectronPhotonFourMomentumCorrectionLib.so
#52 0x00007f98284619b3 in asg::detail::AnaToolConfig::makeToolRootCore(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, asg::IAsgTool*&, asg::detail::AnaToolCleanup&) const () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBase/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libAsgTools.so
#53 0x00007f9828461bd7 in asg::detail::AnaToolConfig::makeBaseTool(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, INamedInterface*, ToolHandle<asg::IAsgTool>&, asg::detail::AnaToolCleanup&) const () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBase/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libAsgTools.so
#54 0x00007f9828474df0 in StatusCode asg::detail::AnaToolConfig::makeTool<asg::IAsgTool>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, INamedInterface*, ToolHandle<asg::IAsgTool>&, asg::detail::AnaToolCleanup&) const () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBase/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libAsgTools.so
#55 0x00007f9828462028 in asg::detail::AnaToolShareList::makeShare(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, asg::detail::AnaToolConfig const&, std::shared_ptr<asg::detail::AnaToolShare>&) () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBase/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libAsgTools.so
#56 0x0000000000426626 in asg::AnaToolHandle<CP::IEgammaCalibrationAndSmearingTool>::initialize() () at /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBase/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/src/Control/AthToolSupport/AsgTools/AsgTools/AnaToolHandle.icc:822
#57 0x000000000041a3d6 in ElectronCalib::operator()(unsigned int, xAOD::Electron_v1*) () at /home/krasznaa/projects/xaodds/xAODDataSource/xAODDataFrameTests/util/rdfToolTest.cxx:180
#58 0x000000000044639d in std::_Function_handler<void ()(unsigned int, xAOD::Electron_v1*), ElectronCalib>::_M_invoke(std::_Any_data const&, unsigned int&&, xAOD::Electron_v1*&&) () at /cvmfs/sft.cern.ch/lcg/releases/gcc/6.2.0-2bc78/x86_64-slc6-gcc62-opt/include/c++/6.2.0/functional:1740
#59 0x000000000048c741 in std::function<void ()(unsigned int, xAOD::Electron_v1*)>::operator()(unsigned int, xAOD::Electron_v1*) const () at /cvmfs/sft.cern.ch/lcg/releases/gcc/6.2.0-2bc78/x86_64-slc6-gcc62-opt/include/c++/6.2.0/functional:2136
#60 0x0000000000486d8b in ShallowModify<DataVector<xAOD::Electron_v1, DataVector<xAOD::Egamma_v1, DataVector<xAOD::IParticle, DataModel_detail::NoBase> > > >::operator()(unsigned int, DataVector<xAOD::Electron_v1, DataVector<xAOD::Egamma_v1, DataVector<xAOD::IParticle, DataModel_detail::NoBase> > > const&, int) const () at /home/krasznaa/projects/xaodds/xAODDataSource/xAODDataFrameTests/util/rdfToolTest.cxx:69
#61 0x0000000000482cb8 in _ZN4ROOT6Detail3RDF13RCustomColumnI13ShallowModifyI10DataVectorIN4xAOD11Electron_v1ES4_INS5_9Egamma_v1ES4_INS5_9IParticleEN16DataModel_detail6NoBaseEEEEENS1_14TCCHelperTypes5TSlotEE12UpdateHelperIJLm0ELm1EEJSD_iEEEvjxSt16integer_sequenceImJXspT_EEENS_10TypeTraits8TypeListIJDpT0_EEEPSG_ () at /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/include/ROOT/RDFNodes.hxx:543
#62 0x00000000004803cb in ROOT::Detail::RDF::RCustomColumn<ShallowModify<DataVector<xAOD::Electron_v1, DataVector<xAOD::Egamma_v1, DataVector<xAOD::IParticle, DataModel_detail::NoBase> > > >, ROOT::Detail::RDF::TCCHelperTypes::TSlot>::Update(unsigned int, long long) () at /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/include/ROOT/RDFNodes.hxx:519
#63 0x000000000048539b in DataVector<xAOD::Electron_v1, DataVector<xAOD::Egamma_v1, DataVector<xAOD::IParticle, DataModel_detail::NoBase> > >& ROOT::Internal::RDF::TColumnValue<DataVector<xAOD::Electron_v1, DataVector<xAOD::Egamma_v1, DataVector<xAOD::IParticle, DataModel_detail::NoBase> > >, false>::Get<DataVector<xAOD::Electron_v1, DataVector<xAOD::Egamma_v1, DataVector<xAOD::IParticle, DataModel_detail::NoBase> > >, 0>(long long) () at /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/include/ROOT/RDFNodes.hxx:857
#64 0x0000000000416239 in _ZN4ROOT6Detail3RDF13RCustomColumnIZ4mainEUlRK10DataVectorIN4xAOD11Electron_v1ES3_INS4_9Egamma_v1ES3_INS4_9IParticleEN16DataModel_detail6NoBaseEEEEE_NS1_14TCCHelperTypes8TNothingEE12UpdateHelperIJLm0EEJSC_EEEvjxSt16integer_sequenceImJXspT_EEENS_10TypeTraits8TypeListIJDpT0_EEEPSH_ () at /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/include/ROOT/RDFNodes.hxx:533
#65 0x0000000000416017 in ROOT::Detail::RDF::RCustomColumn<main::{lambda(DataVector<xAOD::Electron_v1, DataVector<xAOD::Egamma_v1, DataVector<xAOD::IParticle, DataModel_detail::NoBase> > > const&)#1}, ROOT::Detail::RDF::TCCHelperTypes::TNothing>::Update(unsigned int, long long) () at /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/include/ROOT/RDFNodes.hxx:519

It seems to me that the implicit MT code in TTree reading “gets confused”. Since the tools themselves interact with a number of trees to initialise themselves. (These are all separate TTree instances from the one handled by RDataFrame of course.)

Unfortunately I haven’t been able to reproduce the issue in a small standalone application just yet. But could it be that the multi-threaded TTree reading code could get confused in such a setup? :confused:

Cheers,
Attila


ROOT Version: 6.14/04
Platform: x86_64-slc6-gcc62-opt and x86_64-mac1014-clang100-opt
Compiler: GCC 6.2 and Apple Clang 10.0


Hi Attila,
what’s happening in that stacktrace is “nested” TBB task execution:
the lambda that is called at frame #65 (some lambda that you passed to a Define) eventually invokes TTree::GetEntry (through MVAUtils::BDT::BDT(TTree*)); that is an implicitly multi-thread GetEntry, since implicit multi-threading is active.

Now, that spawns subtasks, and while thread 4 waits for all of these tasks to complete it picks up another dataframe task, which eventually tries to take the lock.

If this thread already had the lock, and your mutexes are not recursive, thread 4 is deadlocking itself.

Initialization of these ATLAS tools does not play well with threads, let alone tasks :smile:
Is it not possible to do all initialization sequentially before starting the event loop?

Cheers,
Enrico

EDIT:
in other words, if your lambda is executed in a TBB task and spawns TBB tasks itself, TBB task-stealing mechanisms and scheduling priority require that your lambda is reentrant. I realize that in this case it’s ROOT that is spawning TBB tasks under your feet so it’s not “your fault”.

Hi Enrico,

Very interesting…

The reason that I’m delaying the initialisation like this is that our tools are really not MT friendly in our production branch. :stuck_out_tongue: All analysis tools have access to the current event “behind the scenes”. So that if let’s say you ask for an electron to be calibrated, the tool could go and ask the “event info” object which run we are currently in. Because we didn’t want to spell out such data requirements in our tool interfaces.

This is done by the tool’s constructor looking for the “active” event object when the tool is created. In all current analyses this is of course very simple, we just ask our users to set up their file reading infrastructure before they would start instantiating tools.

In an MT setup I simply just made the “active” event a thread-local definition.

https://gitlab.cern.ch/atlas/athena/blob/21.2/Control/xAODRootAccessInterfaces/Root/TActiveEvent.cxx

But since tools attach themselves upon creation to whichever event object was “active” at that moment, I have to make sure that tools get created/initialised in the thread context in which they will be used from there on out. This is why my code is creating one tool per slot in this job.

Okay, let me try to make this test code a little smarter then…

Cheers,
Attila

1 Like

This is something that we found works well in this cases in which you are writing task-based parallel code that uses thread-unaware objects:

  1. make it thread-safe by having “one thing per thread”
  2. make it task-safe (i.e. reentrant) by having one stack of things per thread

Whenever a new task comes along it pushes an instance of that object to a thread-local stack, and when it exits it pops it from the stack. This works well with nested TBB task execution as that also forms a stack.
Not sure whether this transposes well to your usecase.

I’ll make sure that we clarify the reentrancy requirement in the docs.

Cheers,
Enrico

Okay, let me try to make this test code a little smarter then…

Another pattern that helped when we had objects that relied on global state like this was to change their constructors so that the dependency could be injected there: e.g. TTree now by default attaches itself to the currently active TFile, but its constructor also takes an optional TFile* that overrides this behavior, so we can tell each TTree what TFile it belongs to.

I guess I’m just becoming more and more confused by all of this. :confused:

Let’s assume that TBB, while “slot X” is waiting, schedules the processing of another event into “slot X”.

As you wrote, what we do in xAOD::RDataSource is to create N xAOD::TEvent instances for the N threads that we are running with. So in this example what would happen is that:

  • Slot X is told to load event Y;
  • The processing of event Y starts, but then halts for “something”;
  • Another event gets scheduled into slot X. So it is told to load event Z.
  • The processing of event Z starts, and even hopefully completes;
  • Now the thread returns to processing event Y.

But in this last step the data source is not told to re-load event Y. At least I’m pretty sure that it is not. The reason I’m saying this is that I started seeing some crashes that would be explained by event objects “going out of sync”. (When an in-memory shallow copy of the electron container starts pointing at an “origin container” that is for a different event than for which the copy was created.)

So… As you can see, this all makes me quite confused. I just don’t see how reading files can work correctly with multiple TChain / xAOD::TEvent objects if tasks are allowed to “steal” threads from each other. When this happens, do you call ROOT::RDF::RDataSource::SetEntry(...) to switch the thread “there and back”?

Cheers,
Attila

To spell out my problem a bit better:

I guess “shallow copies” from our EDM are a no-go with RDataFrame. :frowning: Shallow copies in our EDM are objects/containers that are only functional “on top of” an original object/container. As they only hold variables that have been modified. For any unmodified variable they go back to the original object/container.

In my small example I defined a “CalibratedElectrons” container, created from the “Electrons” container that comes from our input file. This “CalibratedElectrons” is a shallow copy.

I guess RDF assumes that all of its objects are self-sufficient. So it could switch the data source to reading a different input event in slot X, while it still didn’t finish fully processing that event. (Since it is only missing processing derived variables a bit further.) And voila, my test crashes.

So… We’d need to use “deep copies” all the time. And even that would not solve all our issues. (But let’s not even get into that here…) :thinking: This is unfortunate. I would really like to make RDF work for applying systematic variations…

I guess we’ll talk more about it on Friday…

Cheers,
Attila

Hi Attila,
I think I missed a few steps and I’m not sure how we got here from the original issue.
I’ll try to clarify things as best as possible.

I just don’t see how reading files can work correctly with multiple TChain / xAOD::TEvent objects if tasks are allowed to “steal” threads from each other

There is an assumption we make that user-defined lambdas/functions passed to Filters and Defines are pure (in the functional programming sense) or at least well-behaved. The logic you execute within the event loop has a lot of dependence on the “global” state (or at least the state of the datasource), and this naturally complicates things.

But in this last step the data source is not told to re-load event Y.

Correct. At this time the values of event Y have been loaded and passed to all consumers, so we don’t load them another time.

I guess RDF assumes that all of its objects are self-sufficient.

Certainly pure functions and stateless or self-contained objects work better within the framework.

it could switch the data source to reading a different input event in slot X, while it still didn’t finish fully processing that event. (Since it is only missing processing derived variables a bit further.)

Yes, that is true for logic that spawns subtasks like yours. I agree this is not nice, I’ll have to think whether we can do something about it.

Soooo I’ve been thinking, and I guess until you have more task-friendly/task-aware classes (if ever) you can take advantage of RDataSource's InitSlot and FinaliseSlot hooks to deal with these kind of issues.
For example, you could re-set the thread-local TEvent to the correct entry in the FinaliseSlot of a nested TBB task.

P.S. this is also to say: I don’t think we can “fix” this on our side, nested task execution is a TBB feature, it cannot be deactivated as far as I know, and that’s the same with task stealing. Combined, they give rise to the (relatively rare) situation of a dataframe task being executed on a thread in which a dataframe task is already in flight – user code should be “pure” enough that it’s not affected by it, and RDataSources should support this scenario if they support multi-thread runs. Sorry I don’t have a better solution.

Hi Enrico,

I’ll need to explain my current understanding of my issues in the meeting that we’ll have shortly. Unfortunately by now I’m very afraid that TBB in general may cause us some very difficult problems because of the issue that you described.

https://www.threadingbuildingblocks.org/docs/help/tbb_userguide/work_isolation.html

By now I’m not only worried for our usage of RDataFrame, but multi-threading with xAOD files in general. :frowning:

Cheers,
Attila

as pointed out by Attila’s link, it can be disabled with ‘work isolation’.

it can be disabled with ‘work isolation’.

Yes, I guess it depends how much we want to marry ROOT interfaces like TPoolManager to TBB task_arena methods.

(also if I understand correctly, if users end up spawning their own subtasks onen way or another, there might be nested TBB tasks within the arena for those subtasks)

Shouldn’t it be consider an implementation detail of the implementation using TBB?

(also if I understand correctly, if users end up spawning their own subtasks onen way or another, there might be nested TBB tasks within the arena for those subtasks)

Yes likely.

Actually, if I understood correctly, the problem is that the outer/user code is taking a lock and then calling the ROOT code that happens to then trigger very indirectly a recursive call to the user code (via the TBB task mechanism). [If I understood correctly then the following comment apply] Per se this kind of long live locked taken encompass calling some code that is also multiple-thread should be avoided. I.e. a solution might be to move around the lock and assignment in the user code to avoid holding the lock when calling the ROOT code …

It’s worse, it’s not just locks. When using TBB, in presence of nested task execution in a given thread, any reliance on that thread-local state poses problems similar to reliance on global state: another task might come and pollute the state while in-between a task execution.
In the context of RDataFrame, for example, if you have implicit multi-threading activated and a Snapshot, you are not guaranteed that, in a given thread, one entry finishes processing before another batch of entries is processed.
This causes several issues to ATLAS code, which relies on thread-local per-entry state to be valid and stable throughout all of a single entry processing (and apparently Gaudi makes the same assumption).

In short: processing of an entry in a given thread should be reentrant to work correctly with TBB in all cases; taking a non-recursive lock is one way to break reentrancy, reliance on read-write thread-local storage is another, there might be more. xAODs are especially heavy on the reliance on thread-local storage.

any reliance on that thread-local state poses problems similar to reliance on global state: another task might come and pollute the state while in-between a task execution.

Yes, this is true and is not related to TBB’s implementation details. This is a clash between the thread based development and task based development. eg. Geant4 has/had the same problem and they had to replace their thread-local storage with a task-arena that is passed to/along the tasks.

Cheers,
Philippe.

Yes it’s not strictly TBB, but any task scheduler that supports work stealing.

In RDF, since we can’t use TBB classes directly, when we could not do without some sort of thread-local state we put it in thread-local std::stack that tasks push and pop when they start and finish.

In RDF, since we can’t use TBB classes directly

I am not sure why that matters.

when we could not do without some sort of thread-local state we put it in thread-local std::stack that tasks push and pop when they start and finish.

Instead you could have ‘given’ each task a ‘state’ object that is passed to the task upon calling it (along side the argument) and then that state object is pass along (as needed) down the function call chain.

I’m not sure this is possible, there is no direct chain of task-spawning calls since multi-threading can be implicit

The ‘implicit’ multi-threading is scheduling a function/task that could be passed a state and it would need to be passed down to all the (necessary) callees. It might be awkward and require either 2 signature or a test on existence or the use of the state even in single thread mode. It is ‘possible’, the question is whether the bit of additional code and parameters is justified by the additional performance (no thread-local address lookup) and sturdiness (no stack that may or may not get out-of-sync, no global side-effects, etc … ).

Cheers,

Philippe.

Yes I didn’t mean to say it is not at all possible, just that the current design does not allow something like that and afaict it could not be easily added – @dpiparo maybe this topic deserves a PPP meeting at some point?