Lion, exceptions and ROOT

Hi,

After a bit of struggle, I managed to arrive at a point in MacOS X Lion, using the newest version of v5-30-00-patches, that my analysis code “almost works”.

But the issue I’m seeing now might be a bit above my level of knowledge. Let me give some context. I use the SFrame framework for writing ROOT analysis code. (sframe.sourceforge.net) The user code in SFrame can signal a number of things using C++ exceptions. One of such things is when the user doesn’t want to process a particular event. To skip writing out information about such events, SFrame does this (lines 214-232):

sframe.svn.sourceforge.net/viewv … iew=markup

So if the exception is one that only asks for the event to be thrown away, then the execution should happily continue. I opted for this solution rather than introducing return codes in the code, because I wanted to make it simple to signal in a function possibly very much down the execution chain if it finds some irregularities with the input data/configuration/etc.

And in any case, this way of coding was quite successful so far. But now I see a very peculiar effect. The analysis code that I’m trying to get to life on Lion is basically just an event selector TSelector. Inside it’s ExecuteEvent(…) function (this is called from Process(…), as can be seen from the previous link) I do something like this:

if( ! m_eventSelector.PassesEventSelection(...) ) { throw SError( SError::SkipEvent ); }

I’ve found that my code always exits without any particular message after it opens the second input file and an exception is thrown. When I investigate what happens in GDB, I get a tremendously long stack trace. The inner-most function calls are these:

[quote]Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0x00007fff5f3ffff8
0x00007fff8ccdd307 in tiny_malloc_from_free_list ()
(gdb) bt 20
#0 0x00007fff8ccdd307 in tiny_malloc_from_free_list ()
#1 0x00007fff8ccde00e in szone_malloc_should_clear ()
#2 0x00007fff8cd13346 in malloc_zone_calloc ()
#3 0x00007fff8cd1415d in calloc ()
#4 0x00007fff8138d62d in __cxa_get_globals ()
#5 0x00007fff8138e1f2 in __cxa_current_exception_type ()
#6 0x00007fff87654183 in _objc_terminate ()
#7 0x00007fff8138d001 in safe_handler_caller ()
#8 0x00007fff8138d05c in std::terminate ()
#9 0x00007fff8138d650 in __cxa_get_globals ()
#10 0x00007fff8138e1f2 in __cxa_current_exception_type ()
#11 0x00007fff87654183 in _objc_terminate ()
#12 0x00007fff8138d001 in safe_handler_caller ()
#13 0x00007fff8138d05c in std::terminate ()
#14 0x00007fff8138d650 in __cxa_get_globals ()
#15 0x00007fff8138e1f2 in __cxa_current_exception_type ()
#16 0x00007fff87654183 in _objc_terminate ()
#17 0x00007fff8138d001 in safe_handler_caller ()
#18 0x00007fff8138d05c in std::terminate ()
#19 0x00007fff8138d650 in __cxa_get_globals ()
(More stack frames follow…)[/quote]

The outermost function calls look like this:

quote bt -20
#327362 0x00007fff8138d001 in safe_handler_caller ()
#327363 0x00007fff8138d05c in std::terminate ()
#327364 0x00007fff8138d650 in __cxa_get_globals ()
#327365 0x00007fff8138e1f2 in __cxa_current_exception_type ()
#327366 0x00007fff87654183 in _objc_terminate ()
#327367 0x00007fff8138d001 in safe_handler_caller ()
#327368 0x00007fff8138d05c in std::terminate ()
#327369 0x00007fff8138d650 in __cxa_get_globals ()
#327370 0x00007fff8138e1f2 in __cxa_current_exception_type ()
#327371 0x00007fff87654183 in _objc_terminate ()
#327372 0x00007fff8138d001 in safe_handler_caller ()
#327373 0x00007fff8138d05c in std::terminate ()
#327374 0x00007fff8138d650 in __cxa_get_globals ()
#327375 0x00007fff8138e136 in __cxa_throw ()
#327376 0x00000001048465f8 in PreSelection::ExecuteEvent ()
#327377 0x0000000100014206 in SCycleBaseExec::Process ()
#327378 0x000000010272c961 in TTreePlayer::Process ()
#327379 0x000000010002226f in SCycleController::ExecuteNextCycle ()
#327380 0x000000010001fb7c in SCycleController::ExecuteAllCycles ()
#327381 0x00000001000014e5 in main ()[/quote]

Since the ExecuteEvent(…) function is extremely simple, I just tried commenting out the exception throwing line. (While keeping the event evaluation as it was.) After this all the problems disappeared. I didn’t apply any event selection anymore, but the code was running fine.

I checked that I get the same using PROOF-Lite. If I only run on a few events, to make sure that all the events are taken from the first file, then everything goes fine. But once a second file is opened by the workers, they all die. I should underline that the code throws away all the events of the first file in my test. So the exception is thrown and caught plenty of times successfully. It’s only once the second file is opened that something goes wrong.

SFrame relies completely on the event loop of PROOF(-Lite) and TChain. So file opening/closing and similar things are completely up to the ROOT code.

Any advice on what I should try, to understand the issue would me very much appreciated. I’ll try to put together a standalone example that can demonstrate the issue if anyone is willing to try it out.

Cheers,
Attila

Hi,

I know that this is a quite detailed and very specific question, but doesn’t anyone have any ideas what could be going wrong? I mostly mean the ROOT developers at this point…

I used the XCode 4.1 version of the g++ compiler of Lion for this test. (v5-30-00-patches doesn’t use clang by default apparently.) But since I also have gcc 4.6 installed by now through Fink, I’ll try that as well. (I don’t have a terrible amount of faith in the built in compilers since I managed to crash clang++ with a typo the other day. The compiler just crashed without giving me a useful error message. I had to compile the code with g++ to figure out what the problem was…)

But other than the compiler doing something that it isn’t supposed to, I have still no idea what could change in memory after opening the second file that would mess up the exception handling. This is very strange.

Cheers,
Attila

Hi,

Turns out, my idea with using Fink’s version of GCC was a good one.

After I compiled v5.30.00.patches with GCC 4.6 that got installed on my machine by Fink (by using --with-cc and friends to override all the compilers), now my analysis code runs very nicely. (The SFrame compilation picks up the same compiler automatically that ROOT was compiled with.) No problems with the exceptions anymore.

Just to be sure, I also tried compiling the same version of ROOT with clang. That produced exactly the same result as XCode’s GCC.

So it seems I’ll just use GCC 4.6 on Lion for now. (I’m actually amazingly surprised by this failure of Apple’s compiler. :frowning: )

Best,
Attila