Linking against CERNLIB on 64-bit machines

marc1uk · April 27, 2021, 4:40pm

OS: CentOS-7
ROOT: 5.28
CERNLIB: 2005 with all latest paches from CERNLIB build instructions for Linux (x86-64)

Hi folks,
This one’s a bit of a long shot as it’s not really ROOT related, but I’ve seen previous threads that indicate there may be expertise here, and I’m out of other options, so here goes:
I’m trying to compile a c++ application that will call Fortran routines dependant on CERNLIB. (I believe to provide routines to interface with ZEBRA files and hbooks and such.)
The application as a whole is a framework that compiles ‘user tools’ into objects, builds a shared library from those objects, and then dynamically links the resulting library of tools into the framework. The Fortran routines are generally called by the user tools.
Currently the application compiles and links fine, but when I try to run the resulting application I get the following error:

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
LOCB/LOCF: address 0x7fb881fd3200 exceeds the 32 bit address space
or is not in the data segments
This may result in program crash or incorrect results
Therefore we will stop here
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

I’ve managed to compile a minimal standalone application that does essentially the same thing by defining a function that calls fortran/cernlib, building it into a shared library, then building a small caller that pulls that function in by linking against the shared library. That works just fine, so something is clearly different when things scale up to the larger framework.
With a bit of help from some experts I’ve been told that the issue is our framework depends on boost, which results in its shared libraries being built in GNU/Linux mode, whereas the working standalone builds in SYSV mode. To be honest, that information is rather above my head, so I’m not sure how to proceed with it.

Has anyone in the ROOT team had experience linking against libraries with CERNLIB dependencies? Is this something that can be worked around? Am I over-complicating the issue?
Any help much appreciated.

Marcus

Wile_E_Coyote · April 27, 2021, 5:23pm

Attached you can find something “minimal” that should work (except for the missing input HBOOK file, of course):

trial.cxx (3.4 KB)

marc1uk · April 27, 2021, 5:43pm

Thanks for the demo, I was able to compile it, but of course can’t run it without the corresponding ntuple (13a_nd7_posi250_numu_h2o_1.nt).
I must admit though, I’m not sure what the salient points are here. Currently my Makefile includes cernlib graflib grafX11 packlib mathlib kernlib lapack3 blas within my LDLIBS, which seems to pull in the necessary requirements at least for compile/linking. Are there some flags in the compilation line in particular I should be paying attention to:

Note: the "--no-pie" flag is needed for gcc 5+, e.g.:
  g++ --no-pie -Wno-write-strings -DgFortran -I${CERN}/${CERN_LEVEL}/include/cfortran -o trial trial.cxx `cernlib packlib` -lgfortran

I tried adding the --no-pie -Wno-write-strings -DgFortran -I${CERN}/${CERN_LEVEL}/include/cfortran -lgfortran , but ended up with numerous errors of the form:

/usr/bin/ld: UserTools/template/MyToolThread.o: relocation R_X86_64_32S against symbol `_ZTV11Thread_args' can not be used when making a shared object; recompile with -fPIC

despite my tool compilation line including -fPIC already…

The other point I noticed is of course

/*
  Note: the PAWC and all ntuple variables and structures MUST be here,
  64 bit architectures require this to be loaded in the 32 bit address space.
*/

// define the PAWC common
#define PAWC_SIZE 50000
typedef struct { float PAW[PAWC_SIZE]; } PAWC_DEF;
#define PAWC COMMON_BLOCK(PAWC,pawc)
COMMON_BLOCK_DEF(PAWC_DEF,PAWC);

but I’m not sure how this is translatable to a general application - at the current level I’m not working with PAW at all. Naively, my simple standalone version would seem to indicate that such definitions are either being done where relevant, or aren’t necessary (probably PAW is not being used?).

Wile_E_Coyote · April 27, 2021, 8:34pm

The “--no-pie” flag (needed when using gcc 5+) should be used when linking executables (not when just compiling individual files nor for shared libraries).

The “PAWC” (Fortran COMMON block) is used by the HBOOK (ZEBRA, KUIP, and HIGZ).

Note that the CERNLIB usually needs to be used in the form of archive libraries (i.e., not shared libraries).

BTW. ROOT provides a standalone “h2root” utility, which can automatically convert HBOOK files (with histograms and ntuples) into ROOT files. Maybe that would be a better approach for you.

couet · April 28, 2021, 6:26am

As mentioned before the h2root application is also an example of a C++ application calling cernlib. In fact it uses the mini-cernlib package provided with ROOT. This package is the minimal set of fortran subroutines needed to build h2root.

marc1uk · April 28, 2021, 9:18pm

Thanks again for the feedback.
Even with --no-pie only placed for the final executable compilation, the issue remained.
I also did try to impose statically linking the archive libraries to main but hit compilation errors, so didn’t get very far. Perhaps it’s a route for further investigation.

I think my difficulty in appreciating the other suggestions here are because the functions I’m calling abstract away many of the underlying details. I expect the PAWC block is already declared in a Fortran header pulled in several levels deep, so is not something I should be declaring in my own code. Similarly, while there are zbs files generated by some monte carlo routines, detector data is converted to ROOT very early on as part of online processing, so I would not have any HBOOKs to convert.

Where then is CERNLIB even being used? What precisely is happening that generates this error? Good questions, but deceptively hard to answer.
What I do know is that somehow, my standalone reproducer is able to call these functions without modification, so I am hoping to keep it that way.

Perhaps h2root and the mini-cernlib package would be useful points of reference.
In the end, however, I did manage to build a functioning application via an extremely janky hack.
Since my standalone application worked, but our framework didn’t, I gradually pulled in parts of the framework until my standalone failed. Eventually I had pulled over all the sourcefiles, effectively turning my standalone entirely into the framework, and it continued to work. The only remaining difference was in the Makefile. Specifically, in the compilation line for the executable I was still linking in the standalone shared library containing a function to call my Fortran routines. That function wasn’t being called any more (it had since been replaced by a user tool that did the same thing) - it was being linked in purely as a Makefile artifact.
Removing that library from the compilation line, though, reintroduced the error.

So something along the lines of (simplified for brevity):

/* mylib.h */
void myfunc();

/* mylib.cc */
#include "mylib.h"
#include "some_headers.h"
extern "C" void some_fortran_routines_();

void myfunc(){
   ...
   some_fotran_routine_();
   ...
}

# Makefile
libmylib.so: mylib.cc
    g++ -shared $^ -o $@

main: main.cc
     g++ -lmylib main.cc -o $@ `cernlib lapack3 blas`

Notably here the only thing in libmylib.so is the function myfunc(), and main.cc does not invoke it. It would seem like the -lmylib could be dropped - but doing so reintroduces the LOCB/LOCF address: ... errors. As a further detail the order of arguments to g++ are also important -lmylib has to appear sufficiently early on.

As a naive guess, perhaps linking this library in early on affects the location of cernlib routines in memory, causing them to reside within the 32-bit range where they’re happy. Who knows. For now, this has been two days of frustration and confusion, but we have a working application so it’ll have to do.