R__LOCKGUARD deadlock in TStreamerInfo::BuildCheck()

Hello
I am using 5.34.01 on windows. My application is compiled w/ Visual Studio 2010 as a dll. I have a number of worker threads doing mostly thread-safe things. Occasionally, they need to access info on a root file. I protect such accesses (e.g. creating of and reading from TFile object) with fMutex->Lock()…fMutex->UnLock() where fMutex is TMutex pointer that is used by all the threads. Each thread has its own TFile object.

From within a single session of my program, I am able to create 10 threads (I have 12 cores) do my work - including occasional reads of TFiles, and collect the results for subsequent analysis. The last thing done in the threads is to delete TFile objects. The first pass works fine - no problems…

However, the second time in the same session, when I create these worker threads, the very first time one attempts to create a TFile readonly object (within TMutex protection) the thread vanishes without a trace. Using the debugger, I traced the problem to the R__LOCKGUARD(gCINTMutex); call in TStreamerInfo::BuildCheck() - executing this line causes the thread to vanish - no exception, nothing.

Does anyone have any ideas as to what I might be doing wrong? It would be very difficult to make a standalone program that exhibits this behavior - I am hoping that somebody could provide hints, caveats, etc… Thanks

Ed

Hi Ed,

Did you create one TThread object per thread (this is a requirement in v5.34) either to use to start the thread or at least as an object that live as long as the thread live?

Cheers,
Philippe.

Hi Phippe,
Yes, I create one TThread object per worker thread and they never go away… I pass a pointer to a TMutex object to each of the threads - that’s what they use to protect non-safe stuff…

Ed

Hi Ed,

As a side note, if each thread is using their own TFile, you should not need to protect their access … but

am using 5.34.01 on windows.

Humm … this is very old. A lot of improvements has been made in the patch branch regarding thread safety. Can you reproduce the problem with v5.34/34?

Cheers,
Philippe.

Regarding the side note: I assumed that since there are root uses globals, e.g. gFile, I would have problems w/ multiple thread access… Perhaps these were removed going to 5.34/34… I will try this later version…
Thanks so much!
Ed

Hi,

gFile, gDirectory and gCanvas when thread support is initialized becomes ‘thread local’ and thus have not problem.

Also one of the features added during the life of the v5.34 patch branch is adding the proper internal locks to make sure that the global (like the list of files) used during the I/O are used safely.

Cheers,
Philippe.

So, you are saying that 5.34/34 has “thread support initialized” or is there something I need to do special? Can I just use the 5.34/34 binaries?
Ed

Hi Ed,

In v5.34, “thread support is initialized” by the creation of the first TThread object (or a call to TThread::Initialize) and is properly setup by having one TThread per thread (i.e. you are already doing it correctly and in your case gDirectory etc… are thread local).

Cheers,
Philippe.

Hi Philippe,
OK, I switched from 5.34/0 to 5.34/34 using the vs2010 binaries from download page. I still have the problem.

Now I crash in TUUID::TUUID() at R__LOCKGUARD2(gROOTMutex) when the TFile constructor is called from within a worker thread. Again, my threads work fine the first time, its the second time in the same root session that they crash.

Any hints or suggestions of what to try would be greatly appreciated…

thanks
Ed

Hi Ed,

Unfortunately, I have no good clue :frowning: … Do you have a reproducer I can try to reproduce the problem with?

Thanks,
Philippe.

I am still struggling with this threading problem. As I say, I am using 5.34.34 w/ vs2010 on Windows 7.
Also, the project I build is a dll. I launch my application from a script that does the following

		TApplication::NeedGraphicsLibs();  // These lines have been here for years...
		gApplication->InitializeGraphics();  // perhaps they are no longer needed...
		gSystem->Load("myApp.dll");
                new CMyGuiApp(gClient->GetRoot(),800,700);

where CMyGuiApp inherits from TGMainFrame. I have an analysis class that is instantiated in CMyGuiApp’s constructor - this analysis class persists throughout the lifetime of CMyGuiApp. The CMyGuiApp object interacts with the analysis object - opening TFiles, issuing commands, displaying data in the canvas, etc. Am I supposed to run CMyGuiApp from within a TThread to properly initialize thread support? I do call TThread::Initialize() from with my analysis class’s ctor - before I interact w/ root’s i/o system.

So, Is there something I should be doing differently? Thannk you
Ed

I made a standalone program that displays the deadlock: Again - 5.34.34 w/ vs2010 on windows 7
I started with root/tutorials/thread/threadsh1.c and changed if from generating TH1 data w/ Rannor() to reading a TTree in a TFile - each thread opens the TFile - as my project does.

To use the script:
.L threadsh1.cpp+
initialize() // only need this once - make the TFile that the demonstration works with

Then, run the script as usual:
threadsh1()

but a second time deadlocks. Any help would be greatly appreciated…threadsh1.C (4.01 KB)

Hi,

[quote]but a second time deadlocks. Any help would be greatly appreciated…[/quote]I (finally) see your use case. Unfortunately, this deadlock is inherent to CINT. CINT interprets the code and can (sometimes does) modify the type database during the execution of a line of code or scripts. This means that the (main) thread must take the lock when executing the line code/scripts. This also means that if any other threads needs access to the type database (for I/O or to execute a CINT command), that other thread will attempt to the take the lock and wait until the main thread releases it … which in the case of threadsh1.C is never because the main thread is waiting for the other threads to finish.

The first round goes through become the lock become enables only after the main thread has started executing the CINT line of code/script (i.e. it does not take the lock because it was enabled when it should have taken it).

Cheers,
Philippe

Hi Philippe,

So, does this mean that the only way to have thread support (e.g. avoid dead lock) is to avoid cint? Is this a windows limitation? or a root5 limitation? Or all versions of root?

If the program runs in batch mode, will I be able to avoid dead lock?

Thank you for your help
Ed

Hi Ed,

Is this a windows limitation? or a root5 limitation?

It is per se a limitation of v5 (albeit we have not yet completely lifted it in v6 but at least there it is possible to remove the limitation).

So, does this mean that the only way to have thread support (e.g. avoid dead lock) is to avoid cint? I
If the program runs in batch mode, will I be able to avoid dead lock?

The way to avoid the thread lock is to either issue a single command from the prompt (for example root.exe -b -q -l threadedScript.C) or write you own main. The essential point is that no thread (including the main thread) should be in always-executing-an-interpreted-command mode [For example it is also okay to have an interpreted command than spawns but does not wait for a bunch of threads]

Cheers,
Philippe.