Build errors for root 5.20-00 on Solaris 10 x86_64

I was surprised to find that my root installations on Solaris 10 were all built for 32 bits. I have therefore attempted to rebuild root on Solaris 10 for 64-bit opteron sun hardware using gcc. I’ve gotten to a point where root-5.20-0 builds without errors (please let me know how I can submit this for inclusion in the root distribution once the remaining problems are cleaned up), but as soon as I try to run root it segfaults in TClass::Init(). Since I have gotten root to work for the same hardware with 32 bits, and for 64-bits with the same compiler and processor but different O/S, I assume that the problems is just one of some wrong conditional code being compiled somewhere. I have set R__B64, R__BYTESWAP, R__SOLARIS and R__SUNGCC3 correctly. Does anyone have any idea why this doesn’t compile correctly?

Could you repeat this exercise with version 5.22 ?

Rene

I get the same results with version 5.22.

FWIW, the call stack contents are listed below. I am particularly confused by the TUnixSystem::Exit() at the bottom of the stack. Does this mean it is failing very early and then trying to load other classes to handle the error? If so, is it possible to turn off the error handling so that I can see the original error?

#0 TClass::Init (this=0x678a60, name=0xfffffd7ffeff520b “TObjString”,
cversion=1, typeinfo=0xfffffd7fff2e1ca0, isa=0x698ce0, showmembers=0,
dfil=, ifil=0xfffffd7ffefcb4cc “”, dl=32, il=0,
silent=false) at core/meta/src/TClass.cxx:839
#1 0xfffffd7ffec2a0e1 in TClass (this=0x678a60,
name=0xfffffd7ffeff520b “TObjString”, cversion=286,
info=@0xfffffd7fff2e1ca0, isa=0x698ce0, showmembers=0,
dfil=0xfffffd7ffeff4f3f “include/TObjString.h”, ifil=0x0, dl=32, il=0,
silent=) at core/meta/src/TClass.cxx:769
#2 0xfffffd7ffec2a27c in ROOT::CreateClass (
cname=0xfffffd7ffeff520b “TObjString”, id=286, info=@0xfffffd7fff2e1ca0,
isa=0x698ce0, show=0, dfil=0xfffffd7ffeff4f3f “include/TObjString.h”,
ifil=0x0, dl=32, il=0)
#3 0xfffffd7ffec3774f in ROOT::TDefaultInitBehavior::CreateClass ()
#4 0xfffffd7ffec35f16 in ROOT::TGenericClassInfo::GetClass ()
#5 0xfffffd7ffee1344d in TObjString::Class ()
from /opt/CERN/root_5.22-00/lib/libCore.so
#6 0xfffffd7ffebb1bb3 in TObjString::IsEqual ()
#7 0xfffffd7ffec12dc3 in TPair::IsEqual ()
#8 0xfffffd7ffec10281 in TList::FindObject ()
#9 0xfffffd7ffec0e942 in THashTable::FindObject ()
#10 0xfffffd7ffec12108 in TMap::Remove ()
#11 0xfffffd7ffec092e6 in TClassTable::Remove ()
#12 0xfffffd7ffec09432 in ROOT::RemoveClass ()
#13 0xfffffd7ffec3775e in ROOT::TDefaultInitBehavior::Unregister ()
#14 0xfffffd7ffec3689b in ROOT::TGenericClassInfo::~TGenericClassInfo ()
#15 0xfffffd7ffec5df8a in __tcf_0 () from /opt/CERN/root_5.22-00/lib/libCore.so
#16 0xfffffd7ffe19d57b in _exithandle () from /lib/64/libc.so.1
#17 0xfffffd7ffe18ee01 in exit () from /lib/64/libc.so.1
#18 0xfffffd7ffec61938 in TUnixSystem::Exit ()
from /opt/CERN/root_5.22-00/lib/libCore.so
#19 0xfffffd7ffec63959 in TUnixSystem::DispatchSignals ()
from /opt/CERN/root_5.22-00/lib/libCore.so
#20 0xfffffd7ffec63a2e in SigHandler ()
from /opt/CERN/root_5.22-00/lib/libCore.so
#21 0xfffffd7ffec5e3ee in sighandler ()
from /opt/CERN/root_5.22-00/lib/libCore.so
#22 0xfffffd7ffe210f36 in __sighndlr () from /lib/64/libc.so.1
#23 0xfffffd7ffe205a72 in call_user_handler () from /lib/64/libc.so.1
#24 0xfffffd7ffe205c58 in sigacthandler () from /lib/64/libc.so.1
#25 0xffffffffffffffff in ?? ()
#26 0x000000000000000b in ?? ()
#27 0x0000000000000000 in ?? ()

Hi,

Yes, exactly. The only way to find out what’s wrong (if there is no error printed at the prompt?) is to attach a debugger, run it in there. and see where the signal is emitted. The backtrace you sent is from a crash while handling the signal.

Once we found the cause we would be very happy to see your patch!

Cheers, Axel.

Hi Axel,

I did as you suggested and ran under gdb after recompiling with --build=debug. It appears to me that there is a problem in Cint, for which I found a fix, but I suspect that someone with a better understanding of cint can do a better job of it.

Basically, root was blowing up in cint trying to execute a command like
(INT)(-2748828353280)
It turns out that the reason it is failing is that the high-order 4 bytes of the address are being lost. You can actually see this from the root prompt by printing large negative numbers, e.g.

root [16] cout << -47735987543264 << endl;
-1721015520

root [23] cout << 47735987543264 << endl;
47735987543264

Since the heap addresses on solaris are apparently large negatives, dereferencing them causes problems.

One fix that i found is to change cint/cint/src/value.h line 18 to:

if (buftype == ‘i’) return (T) buf->obj.i;

As I said above, I expect that someone who knows cint can find a much better fix, but for now this seems to allow me to procede.

Since it seems that the my solaris-64 build changes work somewhat, It would be nice to get them included in the root release so I don’t have to find and update the patches whenever I want to build a new version of root. Others may also be interested in running 64-bit solaris executables. Is there an official submission metod for such things? Unfortunately I don’t think that I can spend very much effort to maintain these changes but I can keep them up to date on the few platforms taht I deal with regularly.

Hi,

thanks for investigating the issue with large negative numbers! It’s now fixed in the trunk (slightly differently than what you suggested).

Please send the solaris-64 patches to rootdev@root.cern.ch.

Cheers, Axel.