TThread deadlock problem

Hi,

I have problems trying using TThreads. I have condensed the problematic part to the included code. Run twice in a root session like

root>.L theads.cxx++g
root> runthreads()
root> runthreads()

it hangs the second time for me. Some gdb on the threads gives

(gdb) c
Continuing.
[New Thread 0x7fe1bc291700 (LWP 18844)]
[New Thread 0x7fe1bba90700 (LWP 18845)]
[New Thread 0x7fe1bb28f700 (LWP 18846)]
[New Thread 0x7fe1baa8e700 (LWP 18847)]
[New Thread 0x7fe1ba28d700 (LWP 18848)]
[New Thread 0x7fe1b9a8c700 (LWP 18849)]
[New Thread 0x7fe1b928b700 (LWP 18850)]
[Thread 0x7fe1bc291700 (LWP 18844) exited]
[Thread 0x7fe1b928b700 (LWP 18850) exited]
[New Thread 0x7fe1b928b700 (LWP 18851)]
[Thread 0x7fe1bb28f700 (LWP 18846) exited]
[Thread 0x7fe1ba28d700 (LWP 18848) exited]
[Thread 0x7fe1bba90700 (LWP 18845) exited]
[Thread 0x7fe1b928b700 (LWP 18851) exited]
[New Thread 0x7fe1b928b700 (LWP 18852)]
[Thread 0x7fe1b928b700 (LWP 18852) exited]
[New Thread 0x7fe1b928b700 (LWP 18853)]
[Thread 0x7fe1baa8e700 (LWP 18847) exited]
[Thread 0x7fe1b928b700 (LWP 18853) exited]
[New Thread 0x7fe1b928b700 (LWP 18854)]
[Thread 0x7fe1b928b700 (LWP 18854) exited]
[New Thread 0x7fe1b928b700 (LWP 18855)]
[Thread 0x7fe1b9a8c700 (LWP 18849) exited]
[Thread 0x7fe1b928b700 (LWP 18855) exited]
[New Thread 0x7fe1b928b700 (LWP 18856)]
[New Thread 0x7fe1b9a8c700 (LWP 18857)]
[New Thread 0x7fe1ba28d700 (LWP 18858)]
[New Thread 0x7fe1baa8e700 (LWP 18859)]
[New Thread 0x7fe1bc291700 (LWP 18860)]
[New Thread 0x7fe1bba90700 (LWP 18861)]
[New Thread 0x7fe1bb28f700 (LWP 18862)]
^C
Program received signal SIGINT, Interrupt.
0x00007fe1c8f19049 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
from /lib64/libpthread.so.0
(gdb) thread apply all bt

Thread 20 (Thread 0x7fe1bb28f700 (LWP 18862)):
#0 0x00007fe1c8f16237 in pthread_join () from /lib64/libpthread.so.0
#1 0x00007fe1c42e91dd in TPosixThread::Join (this=0x24c08f0, th=0x45fba70,
ret=0x0) at /home/joa/root_debug/core/thread/src/TPosixThread.cxx:68
#2 0x00007fe1c42e6120 in TThread::Join (this=0x45fba70, ret=0x0)
at /home/joa/root_debug/core/thread/src/TThread.cxx:512
#3 0x00007fe1c42e5390 in TJoinHelper::JoinFunc (p=0x7ffd508cc950)
at /home/joa/root_debug/core/thread/src/TThread.cxx:131
#4 0x00007fe1c42e6b2f in TThread::Function (ptr=0x45e35b0)
at /home/joa/root_debug/core/thread/src/TThread.cxx:812
#5 0x00007fe1c8f14ee5 in start_thread () from /lib64/libpthread.so.0
#6 0x00007fe1c8c43d1d in clone () from /lib64/libc.so.6

Thread 19 (Thread 0x7fe1bba90700 (LWP 18861)):
#0 0x00007fe1c8f1b51d in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007fe1c8f17136 in _L_lock_870 () from /lib64/libpthread.so.0
#2 0x00007fe1c8f1702f in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007fe1c42e8f38 in TPosixMutex::Lock (this=0x3285640)
at /home/joa/root_debug/core/thread/src/TPosixMutex.cxx:75
#4 0x00007fe1c42e44e1 in TMutex::Lock (this=0x44328a0)
at /home/joa/root_debug/core/thread/src/TMutex.cxx:48
#5 0x00007fe1c42e69c4 in TThread::Lock ()
at /home/joa/root_debug/core/thread/src/TThread.cxx:760
#6 0x00007fe1bccaeffc in testfkn (targs=0x3eb33c0)
at /home/joa/test/./threads.cxx:12
#7 0x00007fe1c42e6b2f in TThread::Function (ptr=0x45db940)
at /home/joa/root_debug/core/thread/src/TThread.cxx:812
#8 0x00007fe1c8f14ee5 in start_thread () from /lib64/libpthread.so.0
#9 0x00007fe1c8c43d1d in clone () from /lib64/libc.so.6

Thread 18 (Thread 0x7fe1bc291700 (LWP 18860)):
#0 0x00007fe1c8f1b51d in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007fe1c8f17136 in _L_lock_870 () from /lib64/libpthread.so.0
#2 0x00007fe1c8f1702f in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007fe1c42e8f38 in TPosixMutex::Lock (this=0x3285640)
at /home/joa/root_debug/core/thread/src/TPosixMutex.cxx:75
#4 0x00007fe1c42e44e1 in TMutex::Lock (this=0x44328a0)
at /home/joa/root_debug/core/thread/src/TMutex.cxx:48
#5 0x00007fe1c42e69c4 in TThread::Lock ()
at /home/joa/root_debug/core/thread/src/TThread.cxx:760
#6 0x00007fe1bccaeffc in testfkn (targs=0x3884730)
at /home/joa/test/./threads.cxx:12
#7 0x00007fe1c42e6b2f in TThread::Function (ptr=0x34663b0)
at /home/joa/root_debug/core/thread/src/TThread.cxx:812
#8 0x00007fe1c8f14ee5 in start_thread () from /lib64/libpthread.so.0
#9 0x00007fe1c8c43d1d in clone () from /lib64/libc.so.6

Thread 17 (Thread 0x7fe1baa8e700 (LWP 18859)):
—Type to continue, or q to quit—
#0 0x00007fe1c8f1b51d in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007fe1c8f17136 in _L_lock_870 () from /lib64/libpthread.so.0
#2 0x00007fe1c8f1702f in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 at /home/joa/root_debug/core/thread/src/TPosixMutex.cxx:75
#4 0x00007fe1c42e44e1 in TMutex::Lock (this=0x44328a0)
at /home/joa/root_debug/core/thread/src/TMutex.cxx:48
#5 0x00007fe1c42e69c4 in TThread::Lock ()
at /home/joa/root_debug/core/thread/src/TThread.cxx:760
#6 0x00007fe1bccaeffc in testfkn (targs=0x37a7c00)
at /home/joa/test/./threads.cxx:12
#7 0x00007fe1c42e6b2f in TThread::Function (ptr=0x34661d0)
at /home/joa/root_debug/core/thread/src/TThread.cxx:812
#8 0x00007fe1c8f14ee5 in start_thread () from /lib64/libpthread.so.0
#9 0x00007fe1c8c43d1d in clone () from /lib64/libc.so.6

Thread 16 (Thread 0x7fe1ba28d700 (LWP 18858)):
#0 0x00007fe1c8f1b51d in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007fe1c8f17136 in _L_lock_870 () from /lib64/libpthread.so.0
#2 0x00007fe1c8f1702f in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007fe1c42e8f38 in TPosixMutex::Lock (this=0x3285640)
at /home/joa/root_debug/core/thread/src/TPosixMutex.cxx:75
#4 0x00007fe1c42e44e1 in TMutex::Lock (this=0x44328a0)
at /home/joa/root_debug/core/thread/src/TMutex.cxx:48
#5 0x00007fe1c42e69c4 in TThread::Lock ()
at /home/joa/root_debug/core/thread/src/TThread.cxx:760
#6 0x00007fe1bccaeffc in testfkn (targs=0x45e3d20)
at /home/joa/test/./threads.cxx:12
#7 0x00007fe1c42e6b2f in TThread::Function (ptr=0x45f8090)
at /home/joa/root_debug/core/thread/src/TThread.cxx:812
#8 0x00007fe1c8f14ee5 in start_thread () from /lib64/libpthread.so.0
#9 0x00007fe1c8c43d1d in clone () from /lib64/libc.so.6

Thread 15 (Thread 0x7fe1b9a8c700 (LWP 18857)):
#0 0x00007fe1c8f1b51d in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007fe1c8f17136 in _L_lock_870 () from /lib64/libpthread.so.0
#2 0x00007fe1c8f1702f in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007fe1c42e8f38 in TPosixMutex::Lock (this=0x3285640)
at /home/joa/root_debug/core/thread/src/TPosixMutex.cxx:75
#4 0x00007fe1c42e44e1 in TMutex::Lock (this=0x44328a0)
at /home/joa/root_debug/core/thread/src/TMutex.cxx:48
#5 0x00007fe1c42e69c4 in TThread::Lock ()
at /home/joa/root_debug/core/thread/src/TThread.cxx:760
#6 0x00007fe1bccaeffc in testfkn (targs=0x4556d10)
at /home/joa/test/./threads.cxx:12
#7 0x00007fe1c42e6b2f in TThread::Function (ptr=0x45f7eb0)
at /home/joa/root_debug/core/thread/src/TThread.cxx:812
#8 0x00007fe1c8f14ee5 in start_thread () from /lib64/libpthread.so.0
#9 0x00007fe1c8c43d1d in clone () from /lib64/libc.so.60x00007fe1c42e8f38 in TPosixMutex::Lock
(this=0x3285640)
Thread 14 (Thread 0x7fe1b928b700 (LWP 18856)):
#0 0x00007fe1c8f1b51d in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007fe1c8f17136 in _L_lock_870 () from /lib64/libpthread.so.0
#2 0x00007fe1c8f1702f in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007fe1c42e8f38 in TPosixMutex::Lock (this=0x45f97a0)
at /home/joa/root_debug/core/thread/src/TPosixMutex.cxx:75
#4 0x00007fe1c42e44e1 in TMutex::Lock (this=0x32b9140)
at /home/joa/root_debug/core/thread/src/TMutex.cxx:48
#5 0x00007fe1c9af1357 in TLockGuard::TLockGuard (this=0x7fe1b9288bd0,
mutex=0x32b9140) at include/TVirtualMutex.h:79
#6 0x00007fe1c9b37d7d in TROOT::FindObject (
this=0x7fe1c9f9b5e0 ROOT::GetROOT1()::alloc,
name=0x7fe1b9289d38 “Hist_6”)
at /home/joa/root_debug/core/base/src/TROOT.cxx:967
#7 0x00007fe1bccaf083 in testfkn (targs=0x24c1950)
at /home/joa/test/./threads.cxx:15
#8 0x00007fe1c42e6b2f in TThread::Function (ptr=0x45fba70)
at /home/joa/root_debug/core/thread/src/TThread.cxx:812
#9 0x00007fe1c8f14ee5 in start_thread () from /lib64/libpthread.so.0
#10 0x00007fe1c8c43d1d in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7fe1ca191980 (LWP 18812)):
#0 0x00007fe1c8f19049 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
from /lib64/libpthread.so.0
#1 0x00007fe1c42e8b95 in TPosixCondition::TimedWait (this=0x3465ad0,
secs=1435662873, nanoSecs=398082000)
at /home/joa/root_debug/core/thread/src/TPosixCondition.cxx:78
#2 0x00007fe1c42e419b in TCondition::TimedWait (this=0x45dc130,
secs=1435662873, nanoSec=398082000)
at /home/joa/root_debug/core/thread/src/TCondition.cxx:100
#3 0x00007fe1c42e429d in TCondition::TimedWaitRelative (this=0x45dc130,
ms=100) at /home/joa/root_debug/core/thread/src/TCondition.cxx:128
#4 0x00007fe1c42e545b in TJoinHelper::Join (this=0x7ffd508cc950)
at /home/joa/root_debug/core/thread/src/TThread.cxx:154
#5 0x00007fe1c42e6148 in TThread::Join (this=0x45fba70, ret=0x0)
at /home/joa/root_debug/core/thread/src/TThread.cxx:517
#6 0x00007fe1bccaf337 in runthreads () at /home/joa/test/./threads.cxx:46
#7 0x00007fe1c9fd8042 in ?? ()
#8 0x00007fe1c6e2d55d in ?? () from /home/joa/root_debug/lib/libCling.so
#9 0x00007ffd508ccd20 in ?? ()
#10 0x00007ffd508cd260 in ?? ()
#11 0x00007fe1c57c7414 in cling::IncrementalExecutor::executeWrapper (
this=0x2361830, function=…, returnValue=0x7ffd508ccd20)
at /home/joa/root_debug/interpreter/cling/lib/Interpreter/IncrementalExecutor.h:172
#12 0x00007fe1c57c4da9 in cling::Interpreter::RunFunction (this=0x235d870,
FD=0x45bc1a0, res=0x7ffd508ccd20)
at /home/joa/root_debug/interpreter/cling/lib/Interpreter/Interpreter.cpp:75
9
#13 0x00007fe1c57c5732 in cling::Interpreter::EvaluateInternal (
this=0x235d870, input="#line 1 “ROOT_prompt_2”\nrunthreads()", CO=…,
V=0x7ffd508ccd20, T=0x0)
at /home/joa/root_debug/interpreter/cling/lib/Interpreter/Interpreter.cpp:992
#14 0x00007fe1c57c3e13 in cling::Interpreter::process (this=0x235d870,
input="#line 1 “ROOT_prompt_2”\nrunthreads()", V=0x7ffd508ccd20, T=0x0)
at /home/joa/root_debug/interpreter/cling/lib/Interpreter/Interpreter.cpp:502
#15 0x00007fe1c59717c3 in cling::MetaProcessor::process (this=0x23c9700,
input_text=0x328c0a0 “#line 1 “ROOT_prompt_2”\nrunthreads()”,
compRes=@0x7ffd508cd02c: cling::Interpreter::kSuccess,
result=0x7ffd508ccd20)
at /home/joa/root_debug/interpreter/cling/lib/MetaProcessor/MetaProcessor.cpp:162
#16 0x00007fe1c5675b06 in TCling::ProcessLine (this=0x235d0e0,
line=0x32b91c0 “#line 1 “ROOT_prompt_2”\nrunthreads()”,
error=0x7ffd508cd2ac) at /home/joa/root_debug/core/meta/src/TCling.cxx:1906
#17 0x00007fe1c9ba164f in TApplication::ProcessLine (this=0x234b6a0,
line=0x32b91c0 “#line 1 “ROOT_prompt_2”\nrunthreads()”, sync=false,
err=0x7ffd508cd2ac)
at /home/joa/root_debug/core/base/src/TApplication.cxx:982
#18 0x00007fe1c966b6e2 in TRint::ProcessLineNr (this=0x234b6a0,
filestem=0x7fe1c966fc57 “ROOT_prompt_”,
line=0x7ffd508cd309 “runthreads()”, error=0x7ffd508cd2ac)
at /home/joa/root_debug/core/rint/src/TRint.cxx:729
#19 0x00007fe1c966b186 in TRint::HandleTermInput (this=0x234b6a0)
at /home/joa/root_debug/core/rint/src/TRint.cxx:601
#20 0x00007fe1c9668de5 in TTermInputHandler::Notify (this=0x3412fe0)
at /home/joa/root_debug/core/rint/src/TRint.cxx:124
#21 0x00007fe1c966c44f in TTermInputHandler::ReadNotify (this=0x3412fe0)
at /home/joa/root_debug/core/rint/src/TRint.cxx:116
#22 0x00007fe1c9c25155 in TUnixSystem::CheckDescriptors (this=0x232b7a0)
at /home/joa/root_debug/core/unix/src/TUnixSystem.cxx:1297
#23 0x00007fe1c9c245d2 in TUnixSystem::DispatchOneEvent (this=0x232b7a0,
pendingOnly=false)
at /home/joa/root_debug/core/unix/src/TUnixSystem.cxx:1052
#24 0x00007fe1c9b7418d in TSystem::InnerLoop (this=0x232b7a0)
at /home/joa/root_debug/core/base/src/TSystem.cxx:409
#25 0x00007fe1c9b73f28 in TSystem::Run (this=0x232b7a0)
at /home/joa/root_debug/core/base/src/TSystem.cxx:359
#26 0x00007fe1c9ba1fe2 in TApplication::Run (this=0x234b6a0, retrn=false)
at /home/joa/root_debug/core/base/src/TApplication.cxx:1130
#27 0x00007fe1c966a593 in TRint::Run (this=0x234b6a0, retrn=false)
at /home/joa/root_debug/core/rint/src/TRint.cxx:455
#28 0x0000000000401301 in main (argc=1, argv=0x7ffd508cf768)
at /home/joa/root_debug/main/src/rmain.cxx:29

root-config --version -> 6.03/03

any ideas?

cheers

Joa
threads.cxx (1.22 KB)

Hi,

It seems the deadlock comes from ROOT itself, between gROOT->FindObject() and the interpreter/console. So we’ll have to debug this. What you can do is to try to create a standalone application to see if it works…

Cheers, Bertrand.

Hi again,

ok. I solved the problem the by cutting the Gordian knot. I could get around it by changing a recreate to update in how I save the results and just quit root in between the two runs…

cheers

Joa

Hi,

any progress?

kind regards

Joa

Hi,

Please use a local TMutex for synchronization. Else you will have interactions with the internal ROOT locks used to protect certain interfaces like TROOT::FindObject().

Cheers, Axel.

Hi Axel,

I’m not sure I follow… What I’m supposed to understand with “use local TMutex”. I tried just declaring a TMutex. No luck. A static TMutex, no luck… Removing all locking, no luck…

Is there a way to make my small example run several times consecutive?

cheers

Joa

Hi Joa,

Indeed this is a known limitiation that will soon but not yet be resolved. This is due to lock taken by ‘ProcessLine’ that is currently held during the user code execution (but should not).

In the meantime, you can work-around the problem with: root [0] gInterpreter->SetProcessLineLock(false); root [1] .L threads.cxx+ root [2] runthreads() .... root [3] runthreads() .... root [4]

While the SetProcessLineLock is turned off, you need to make sure that you do not execute a ProcessLine/command-line while threads are running. (i.e. if at the end of runthreads, the threads were still running, it would be ‘dangerous’ (i.e. might work or might not work) to execute anything on the command line).

Also to avoid other type of dead-lock when locking access to the global ROOT object, we recommend the use of the gROOTMutex rather than TThread::Lock. i.e.[code]
void *testfkn(void *targs){
TRandom3 *re = new TRandom3;
TH1F *hist = nullptr;

{
R__LOCKGUARD(gROOTMutex);
std::cout << "Staring thread " << ((int)targs) << “\n”;
gROOT->cd();
if(gROOT->FindObject(Form(“Hist_%d”,((int)targs)))){
gROOT->FindObject(Form(“Hist_%d”,((int)targs)))->Delete();
}
hist = new TH1F(Form(“Hist_%d”,((int)targs)),
Form(“Hist_%d”,((int)targs)),
100,-5,5);
}
for(int i=0; i<1000000; i++){
hist->Fill(re->Gaus());
}
{
R__LOCKGUARD(gROOTMutex);
std::cout << "Stopping thread " << ((int)targs) << “\n”;
}
return 0;
}
[/code]

Cheers,
Philippe.

Hi Philippe,

just to confirm, what you suggest works of course, also in my original code.

cheers

Joa