Crash when exiting python after loading ROOT modules on win

Hi there,
I’m using Python 2.4.3 and root 5.12/00. I am building applications and other code in MSVC8, so I’ve built root against MSVC 8.0 (in optimize mode). I can run root and load in shared libraries I use just fine.

Python is another case, however. For example:

D:\users\gwatts\D0\Physics\Top\SingleTop\Code\cafe>python
Python 2.4 (#60, Nov 30 2004, 11:49:19) [MSC v.1310 32 bit (Intel)] on win32
Type “help”, “copyright”, “credits” or “license” for more information.

from ROOT import TFile
^Z

==========================================
=============== STACKTRACE ===============

================ Thread 0 ================

==========================================
============= END STACKTRACE =============

And I get a dialog box claiming python was trying to access an invalid memory address.

Debugging help much appreciated!

Cheers, Gordon.

Gordon,

hmm, and here is I hoping I’d nailed it: PyROOT has an elaborate cleaning sequence at shutdown, to make sure the GUI thread is removed and any possible circular references are broken. However, I keep on getting reports that on Windows, it still isn’t working (on no other platform has it ever been a problem).

Use this piece of code instead:from ROOT import gROOT gROOT.SetBatch(1) from ROOT import TFile and the issue shouldn’t be there, proving that it is a problem with the GUI thread (which on Windows tickles the MemoryRegulator class, after it’s static object map has been unloaded from memory).

You can find the shutdown sequence in ROOT.py; look for atexit. If there’s anything you can find that localizes the problem on Windows, I’d appreciate it (I don’t have access to a Win box right now, and don’t have MSVC8).

Cheers,
Wim

Hi,
Thanks! Sadly, I still get the crash when I follow your instructions. I’m re-building root in debug mode now and will run it under a debugger to get a more detailed stack dump to see if I can figure out what is going on a little better.

-Gordon.

Hi Wim,
I got lost, and fast! :slight_smile:

When I build in debug mode, the access violation occurs when ROOT attempts to access memory location 0xcdcdcdcd - the value the debugging libraries fill all uninitalized variables. In short – ROOT is trying to access some uninitalized memory (it could also be memory that has been previously deleted).

Here is the stack dump when it happens:

[code] msvcr80d.dll!strlen(unsigned char * buf=0xcccccccc) Line 69 Asm
libCore.dll!TString::TString(const char * cs=0xcdcdcdcd) Line 284 + 0xf bytes C++

libCore.dll!TCint::AutoLoadCallback(const char * cls=0x0021f268, const char * lib=0x044635a0) Line 1154 + 0x30 bytes C++
libCore.dll!TCint_AutoLoadCallback(char * c=0x00e4c2e0, char * l=0x044635a0) Line 79 + 0x1c bytes C++
libCint.dll!G__class_autoloading(int tagnum=861) Line 427 + 0x1b bytes C++
libCint.dll!G__defined_tagname(const char * tagname=0x02eac058, int noerror=2) Line 578 + 0x9 bytes C++
libCore.dll!TCint::CheckClassInfo(const char * name=0x02eac058) Line 647 + 0xb bytes C++
libCore.dll!TCint::SetClassInfo(TClass * cl=0x02eabf48, bool reload=true) Line 570 + 0x29 bytes C++
libWin32gdk.dll!TGWin32InterpreterProxy::SetClassInfo(TClass * cl=0x02eabf48, bool reload=true) Line 65 + 0x1bd bytes C++
libCore.dll!TClass::SetUnloaded() Line 3303 + 0x22 bytes C++
libCore.dll!ROOT::RemoveClass(const char * cname=0x10065d34) Line 532 C++
libCore.dll!ROOT::TDefaultInitBehavior::Unregister(const char * classname=0x10065d34) Line 216 + 0x9 bytes C++
libCore.dll!ROOT::TGenericClassInfo::~TGenericClassInfo() Line 176 + 0x2b bytes C++
libPyROOT.dll!PyROOT::ROOT::GenerateInitInstance'::2’::`dynamic atexit destructor for ‘instance’’() + 0xd bytes C++
libPyROOT.dll!_CRT_INIT(void * hDllHandle=0x10000000, unsigned long dwReason=0, void * lpreserved=0x00000001) Line 417 C
libPyROOT.dll!__DllMainCRTStartup(void * hDllHandle=0x10000000, unsigned long dwReason=0, void * lpreserved=0x00000001) Line 509 + 0x11 bytes C

libPyROOT.dll!_DllMainCRTStartup(void * hDllHandle=0x10000000, unsigned long dwReason=0, void * lpreserved=0x00000001)  Line 459 + 0x11 bytes	C[/code]

I stepped around with the debugger. The line that causes the problems is 1154 in TCINT.cxx, I think:

TString deplibs = gInterpreter->GetClassSharedLibs(cls);
when the crash occurs cls is “PyROOT”. Also, gInterpreter has been changed to be a TWin32interpreterproxy guy, and it is what returns the uninitalized memory value, which causes TString to barf, and ROOT to crash.

Now, moving into the realm of shear speculation, I tried to follow the rabbit a bit further. I’m pretty sure the following is what actually happens:

  1. TGWin32ProxyInterpreter uses the code in the file TGWin32ProxyDefs, RETURN_METHOD_ARG1 to execute the call back onto the (possibly) main thread of GetClassSharedLibs

  2. It creates its tmp object (which I see), and fills “ret” with cdcdcdcd as it is uninitalized (tsk tsk).

  3. It uses ForwardCallBack in order to send a mesage using the windows message pump to the main thread (this can be found in TGWin32ProxyBase).

  4. PostThreadMessage attempts to post the message 5 times. It fails all five times and returns false.

  5. The RETURN_METHOD_ARG1 code returns the “ret” – which wasn’t changed because the call never completed – and so cdcdcdcd is returned.

I didn’t look at GetLastError for PostThreadMessage, but I’m going to guess that it will return ERROR_INVALID_THREAD_ID – in short, the thread that the message was getting sent to was already dead.

ROOT, btw, starts up and shuts down just fine from the command line. No problems at all. I’ve also linked C++ programs against the thread libraries (and done UI stuff in those C++ programs) and not had any trouble. So, if it is as I guess, there must be some funny ordering thing going on as the shutdown occurs and, somehow, the main thread is gone.

I hope this is enough to either spot the problem or tell me what I should look at next!

-Gordon.

BTW, it seems like the GUI thread is started even if SetBatch (1) is called. So there may also be something going wrong in how the python interface is starting up root…

Gordon,

SetBatch(1) per se doesn’t switch of the thread; it’s never started if the first call after “import ROOT” is “ROOT.gROOT.SetBatch(1)”. In all other cases, it is started. You can still setup a GUI, since TApplication follows a different logic (I only found that out recently, and I’ll need to fix that, if possible), but it will not be responsive.

TGWin32ProxyDefs.h purely exists to be able to share the data across DLLs, so I think it’s a red herring (a Win expert can correct me).

Just to weed out: did you run ‘make map’ after installing?

From the traceback, what I’d like to know, is what is in the name parameter of: libCore.dll!TCint::CheckClassInfo(const char * name=0x02eac058)
My initial guess is, that that is the name of a class from PyROOT (which isn’t tickled in a normal ROOT session, and hence doesn’t cause problems), and that furthermore, that class’ rootmap entry is missing or messed up …

Cheers,
Wim

Hi Wim,
I did a “make install” after “make”, and “make install” should run “make map”. I ran make map specifically just to check – made no difference.

The argument to CheckClassInfo is “PyROOT”.

Even if import ROOT doesn’t setup the thread, it is pretty clear that during shut down it is attempting to communicate with a second thread. So even if it wasn’t actually started, ROOT seems to think that it has started.

I went ahead and caused a GUI window to come up in python – I created a TH1F and filled it with a few entries and then told it to draw itself. The same crash occurs.

  • Gordon.

I think I found the main GUI thread creation – it happens in TGWin32MainThread (or the secondary thread).

When I type “import ROOT”, TGWin32MainThread::TGWin32MainThread is called. This is becaues TGWin32 ctor is called, and gMainThread is null and gROOT->IsBatch is also null. Here is the important bit of the stack trace from my debug build:

[code] libWin32gdk.dll!TGWin32MainThread::TGWin32MainThread() Line 762 C++
libWin32gdk.dll!TGWin32::TGWin32(const char * name=0x02ee19a8, const char * title=0x02ee1a08) Line 812 + 0x1f bytes C++
libWin32gdk.dll!_G__cpp_setup_tagtableG__Win32gdk() + 0x419a bytes C++
libCint.dll!G__CallFunc::Execute(void * pobject=0x00000000) Line 413 + 0x20 bytes C++
libCore.dll!G__CallFunc::ExecInt(void * pobject=0x00000000) Line 99 + 0x3b bytes C++
libCore.dll!TMethodCall::Execute(void * object=0x00000000, long & retLong=-858993460) Line 351 + 0xf bytes C++
libCore.dll!TMethodCall::Execute(long & retLong=-858993460) Line 115 + 0x1c bytes C++
libCore.dll!TPluginHandler::ExecPlugin(int nargs=2, …) Line 326 C++
libCore.dll!TApplication::LoadGraphicsLibs() Line 609 + 0x1d bytes C++
libCore.dll!TApplication::TApplication(const char * appClassName=0x100639e8, int * argc=0x0021f508, char * * argv=0x02ed3b30, void * options=0x00000000, int numOptions=0) Line 154 C++
libPyROOT.dll!PyROOT::TPyROOTApplication::TPyROOTApplication(const char * acn=0x100639e8, int * argc=0x0021f508, char * * argv=0x02ed3b30, bool bLoadLibs=true) Line 44 + 0x56 bytes C++
libPyROOT.dll!PyROOT::TPyROOTApplication::CreatePyROOTApplication(bool bLoadLibs=true) Line 95 + 0x33 bytes C++
libPyROOT.dll!_G__cpp_setup_tagtableG__PyROOT() + 0x1566 bytes C++
libCint.dll!G__CallFunc::Execute(void * pobject=0x00000000) Line 413 + 0x20 bytes C++
libPyROOT.dll!G__CallFunc::ExecInt(void * pobject=0x00000000) Line 99 + 0x3b bytes C++
libPyROOT.dll!PyROOT::TIntExecutor::Execute(G__CallFunc * func=0x02ee11b8, void * self=0x00000000) Line 47 + 0xc bytes C++
libPyROOT.dll!PyROOT::TMethodHolder::CallFast(void * self=0x00000000) Line 92 + 0x26 bytes C++
libPyROOT.dll!PyROOT::TMethodHolder::CallSafe(void * self=0x00000000) Line 115 + 0xc bytes C++
libPyROOT.dll!PyROOT::TMethodHolder::Execute(void * self=0x00000000) Line 396 + 0xc bytes C++
libPyROOT.dll!PyROOT::TClassMethodHolder::operator()(PyROOT::ObjectProxy * __formal=0x00000000, _object * args=0x00961030, PyROOT::ObjectProxy * __formal=0x00000000) Line 28 + 0x11 bytes C++

libPyROOT.dll!PyROOT::`anonymous namespace’::mp_call(PyROOT::MethodProxy * meth=0x009fd4d0, _object * args=0x00961030, _object * kwds=0x00000000) Line 105 + 0x36 bytes C++
[/code]
The argument to the mp_call thing on the bottom looks like CreatePyROOTApplication.

Perhaps the main gui thread is somewhere else?

-Gordon.

Gordon,

PyROOT is a namespace, not a class, so this would explain both why RINT isn’t affected by it, as well as why the sharedlibs are an empty list. If that’s the problem, I’d consider it a bug in the auto loading mechanism. Could you add a line like:Library.PyROOT: libPyROOT.so into your etc/system.rootmap file? If the crash goes away, it’s telling that there should be a check in the auto loading that isn’t there right now.

PyROOT::TPyROOTApplication is build in ROOT.py and isn’t controlled by the SetBatch(1). To prevent GUI stuff from appearing at all, one can run:[code]$ python - -b

import ROOT
[/code] which then should prevent firing up that thread (which is a different one that I was thinking/talking about in my earlier posts). TPyROOTApplication doesn’t do much, but its base class TApplication does, which is what you’re looking at.

Now, it could be that the name it is looking for isn’t per se the PyROOT namespace, but rather the name as passed through TPyROOTApplication (and hence to TApplication). That would seem to make more sense. In pyroot/src/TPyROOTApplication.cxx, you’ll find: gApplication = new TPyROOTApplication( "PyROOT", &argc, argv, bLoadLibs );
Just for the effect, can you change it to: gApplication = new TPyROOTApplication( "Rint", &argc, argv, bLoadLibs ); That shouldn’t hurt, it may help (or at least, the name parameter can be checked to change to “Rint” so we know where it’s coming from.

Cheers,
Wim

P.S. I’ll be out of e-mail contact all of next week …

Hi,

I found an issue in the shutdown sequence, which had an easy fix. That fix may help here as well. There we’re still lookups into ROOT on shutdown of the ROOT.py module. By definition, new lookups are unnecessary, and old ones don’t enter the C++ code. Now, the method that allows lookups is removed at the start of shutdown, so any attempt for a new lookup will simply fail, hence not touch C++ code, and (hopefully) prevent this crash.

Cheers,
Wim