ROOT Crashes on TBrowser and TCanvas

Hi,

I am having the weirdest problem with X forwarding. I have a dedicated server with Arch Linux x64, dual Xeon E5504 and an MGA G200eW WPCM450 VGA. I am trying to set it up for a small compute node. I have installed and configured mesa, mga-dri, ssh, Xorg, X Forwarding, etc. and confirmed it works with xeyes. I am the package maintainer for ROOT and you can find the compilation options here:
PKGBUILD
settings.cmake

My problem is the following - when I launch ROOT, I am able to see the splash screen correctly forwarded and all works. However, when I try to load a more complex GUI element, like a TBrowser or TCanvas, ROOT crashes with:

Generating stack trace...
 0x00007fbadca4255d in cling::Interpreter::EvaluateInternal(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, cling::CompilationOptions, cling::Value*, cling::Transaction**, unsigned long) + 0x18d from /usr/lib/root/libCling.so
 0x00007fbadca4262c in cling::Interpreter::evaluate(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, cling::Value&) + 0x1c from /usr/lib/root/libCling.so
 0x00007fbadc9b8ffe in TCling::Calc(char const*, TInterpreter::EErrorCode*) + 0xce from /usr/lib/root/libCling.so
 0x00007fbae0fd60d6 in TROOT::ProcessLineFast(char const*, int*) + 0x86 from /usr/lib/root/libCore.so
 0x00007fbad1ea3683 in <unknown> from /usr/lib/root/libGX11.so
 0x00007fbad1b986fd in _XError + 0x11d from /usr/lib/libX11.so.6
 0x00007fbad1b95627 in <unknown> from /usr/lib/libX11.so.6
 0x00007fbad1b956e5 in <unknown> from /usr/lib/libX11.so.6
 0x00007fbad1b965f8 in _XReply + 0x238 from /usr/lib/libX11.so.6
 0x00007fbad1b812ff in XInternAtom + 0xcf from /usr/lib/libX11.so.6
 0x00007fbad0d07cfa in TGClient::TGClient(char const*) + 0x26a from /usr/lib/root/libGui.so
 0x00007fbad0dee793 in TRootApplication::TRootApplication(char const*, int*, char**) + 0x83 from /usr/lib/root/libGui.so
 0x00007fbad0e11ab1 in TRootGuiFactory::CreateApplicationImp(char const*, int*, char**) + 0x31 from /usr/lib/root/libGui.so
 0x00007fbae104934a in TApplication::InitializeGraphics() + 0x17a from /usr/lib/root/libCore.so
 0x00007fbae10533bd in TBrowser::TBrowser(char const*, char const*, TBrowserImp*, char const*) + 0x8d from /usr/lib/root/libCore.so
 0x00007fbae18ab0be in <unknown function>
 0x00007fbadca912d8 in cling::IncrementalExecutor::runStaticInitializersOnce(cling::Transaction const&) + 0x328 from /usr/lib/root/libCling.so
 0x00007fbadca43593 in cling::Interpreter::executeTransaction(cling::Transaction&) + 0x73 from /usr/lib/root/libCling.so
 0x00007fbadca997a3 in cling::IncrementalParser::commitTransaction(llvm::PointerIntPair<cling::Transaction*, 2u, cling::IncrementalParser::EParseResult, llvm::PointerLikeTypeTraits<cling::Transaction*>, llvm::PointerIntPairInfo<cling::Transaction*, 2u, llvm::PointerLikeTypeTraits<cling::Transaction*> > >&) + 0x4a3 from /usr/lib/root/libCling.so
 0x00007fbadca9cd26 in cling::IncrementalParser::Compile(llvm::StringRef, cling::CompilationOptions const&) + 0x66 from /usr/lib/root/libCling.so
 0x00007fbadca4245e in cling::Interpreter::EvaluateInternal(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, cling::CompilationOptions, cling::Value*, cling::Transaction**, unsigned long) + 0x8e from /usr/lib/root/libCling.so
 ...

Seems like an X API mismatch, but I’m not sure. I have tried all I know to correct this. I installed Xorg devel, utils, etc., made sure everything is linking correctly at compilation and recompiled several times. ldd does not complain and all the relevant lib files are there. I have previously setup the same X Forwarding from my desk machine at home, which has NVidia VGA, and all this works fine. The main difference I see here is that the desk machine has GDM and GNOME and the server does not have a display manager on top of Xorg. I do not think that is necessary. Am I wrong?

Is this an X Forwarding issue? Is it a driver issue? Is it that ROOT wants an API the MGA driver doesn’t have? I am really out of my depth here. Anyone has any ideas on how to debug this?

Thanks.

1 Like

Hi,

Looks like a corrupted build; x11 should not call libgx11 nor the interpreter. How did you build root?

Cheers, Axel

I included the full details on how ROOT is built in my original post - PKGBUILD script and settings.cmake. You can see all the CMake settings and variables that are specified (settings.cmake) and the PKGBUILD gives you the dependencies and commands used to compile the package. This is the relevant bit:

build() {
    [ -d ${srcdir}/build ] || mkdir ${srcdir}/build
    cd ${srcdir}/build

    CFLAGS="${CFLAGS} -pthread" \
    CXXFLAGS="${CXXFLAGS} -pthread" \
    LDFLAGS="${LDFLAGS} -pthread -Wl,--no-undefined" \
    cmake -C ${srcdir}/settings.cmake ${srcdir}/${_pkgname}-${pkgver}

    make ${MAKEFLAGS}
}

I build it the same for all my machines and as I mentioned, I am the package maintainer for Arch Linux - meaning I would have expected a lot of complaints if it wasn’t working for other people.

If the build is corrupted and linking against the wrong lib, then I would think this is a bug somewhere - either a platform specific misread in CMake or wrong CMake options for this particular platform. Please check my CMake config.

Thanks.

Hi,

Apologies! And I understand now - more or less - what’s happening; the missing link going from X11 to libGX11 to the interpreter is RootX11ErrorHandler() from GX11Gui.cxx. So indeed - we wrote that, it’s not a build issue.

Now - there are at least two issues here:

  1. something X11-ish goes wrong, brings us into the RootX11ErrorHandler(). We can debug that by doing
gDebug = gVirtualX;
new TBrowser()

Before crashing, this should show the X11 error, resource id and request code of the XErrorEvent. That should help Olivier determine what’s happening. (Do you run ssh -Y or -X? ROOT needs -Y…)

  1. the interpreter seems to crash (at an unlikely stack frame…) while evaluating an expression in the error handler. I’m trying to see where this could happen; the fact that cling::Interpreter::EvaluateInternal) is the stack frame where the crash happens makes this really special - I’ve never seen this before and I cannot reproduce it with a call similar to what’s happening in RootX11ErrorHandler(). I’d love to have you attach gdb to root.exe (“root” spawns “root.exe”) and show the disassembly of

cling::Interpreter::EvaluateInternal(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, cling::CompilationOptions, cling::Value*, cling::Transaction**, unsigned long) + 0x18d

so I can see where it happens.

Thanks!

Cheers, Axel.

I tried it and got this:

root [8] gDebug = gVirtualX;
ROOT_prompt_8:1:8: error: assigning to 'Int_t' (aka 'int') from incompatible type
      'TVirtualX *'
gDebug = gVirtualX;
       ^ ~~~~~~~~~
root [9] new TBrowser()
Error in <TGClient::TGClient>: can't open display "localhost:10.0", switching to batch mode...
 In case you run from a remote ssh session, reconnect with ssh -Y
(TBrowser *) 0x2eb7eb0
root [10] .q

When I reconnected with -Y, forwarding worked fine. However, I still have no idea why it happened in the first place. Does ROOT require trusted forwarding? That wouldn’t be ideal. I’m also not sure how to disassemble that function to be honest. However, I am willing to give you a login account on the node, if you are interested. Let me know. You can email me on for details.

Hi,

Argh right sorry, less sloppy coding this time around:

gDebug = (long)gVirtualX; new TBrowser();

(I.e. put both into the same line, ideally)

And yes, ROOT requires trusted forwarding - I believe because if the xextensions we use. If you care for details I can ask the people who know! I tried with just “-X” and new TBrowser and that works for me…

Given that we now what what has caused it, the remaining part is in principle the crash in the interpreter. That’s also what I care about most - so yes, I’ll contact you, thanks!

Cheers, Axel.

The log I got from the trusted forward is very large (46MB), but it works in that case anyway. Here’s the untrusted log - pastebin.com/X7wn0M0g

Hi.

Thanks - but I don’t see any crash in there :-/ I expected a mention of RootX11ErrorHandler but that’s not there. If I veto the unrelated lines starting with “Key:” I end up with

i.e. no error? Is this maybe the wrong file?

Cheers, Axel.

Sorry, forgot to pipe stderr - pastebin.com/rfxvk9Kr

1 Like

Great! Yes, that was broken - now fixed. Thanks for helping me track this down - now I don’t need a login anymore.

Cheers, Axel.