Seg Fault when passing arrays to a fortran routine

samhamil · February 5, 2011, 6:36pm

I am trying to revive some code that was used for a cdf analysis. The code consists of a macro called testing.C which loads a fortran library that I compiled and also compiles a macro called TopFort.cc The fortran routine is called from TopFort.cc. TopFort passes the address of five arrays to the fortran routine, where the information in 4 of the arrays is used to make a calculation and the fifth to store the results.

The machine I’d like to run my analysis on is using atlas software as opposed to the cdf software package, however, I seg fault when trying to access information from the arrays. I will now describe how the code was compiled on both machines to try and sniff out where my problem is.

The cdf version of the code was compiled using g77, here are the specs

[shamil01@tuhept]~ > g77 -v
Reading specs from /home/cdfsoft/products/gcc/v3_4_3/Linux+2.6/bin/…/lib/gcc/i686-pc-linux-gnu/3.4.3/specs
Configured with: …/…/gcc-3.4.3/configure --prefix=/tmp/build-gcc-v3_4_3 --disable-shared --with-gnu-ld --with-ld=/tmp/build-gcc-v3_4_3/bin/ld --with-gnu-as --disable-libgcj --with-as=/tmp/build-gcc-v3_4_3/bin/as --enable-threads=posix --enable-languages=c,c++,f77,objc
Thread model: posix
gcc version 3.4.3

and the code was compiled as such:

g77 -Wall -I. -fPIC -c ellipse_pdflib.F
g++ -Wl,-soname,libTopFortran.so -shared -o libTopFortran.so ellipse_pdflib.o -L
$CERN_DIR/lib -lpdflib -lkernlib -lmathlib -lpacklib -lg2c

and it uses root version: Version 4.00/08 1 December 2004

This program runs just fine…now for the seg fault…The machine I’m running atlas software on does not have g77, but rather gfortran. Here are the specs:

Using built-in specs.
Target: x86_64-redhat-linux
Configured with: …/configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-libgcj-multifile --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --enable-plugin --with-java-home=/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre --with-cpu=generic --host=x86_64-redhat-linux
Thread model: posix
gcc version 4.1.2 20080704 (Red Hat 4.1.2-48)

And the code is compiled as such:

gfortran -c -m32 -I. -fPIC ellipse_pdflib.for
gfortran -m32 -Wall,-soname,libTopFortran.so -shared -o libTopFortran.so ellipse_pdflib.o -L
/cluster/ATLAS/osg/app/atlas_app/atlas_rel/16.0.3/sw/lcg/external/cernlib/2006a/i686-slc5-gcc43-opt/lib -lpdflib804 -lkernlib -lmathlib -lpacklib

using root version:

Version 5.26/00e 13 October 2010

The seg fault appears as and I’ve narrowed it down by my own debugging to a statement that accesses an element from one of the arrays passed to the fortran routine (called dilep):

*** Break *** segmentation violation
Attaching to program: /proc/26988/exe, process 26988
[Thread debugging using libthread_db enabled]
0xffffe410 in __kernel_vsyscall ()
#1 0x00593713 in _waitpid_nocancel () from /lib/libc.so.6
#2 0x0053807b in do_system () from /lib/libc.so.6
#3 0x0066aead in system () from /lib/libpthread.so.0
#4 0xf7aa41dd in TUnixSystem::Exec(char const*) ()
from /cluster/ATLAS/osg/app/atlas_app/atlas_rel/15.6.14/sw/lcg/app/releases/R
OOT/5.22.00j/i686-slc5-gcc43-opt/root/lib/libCore.so
#5 0xf7aa96eb in TUnixSystem::StackTrace() ()
from /cluster/ATLAS/osg/app/atlas_app/atlas_rel/15.6.14/sw/lcg/app/releases/R
OOT/5.22.00j/i686-slc5-gcc43-opt/root/lib/libCore.so
#6 0xf7aaa48d in TUnixSystem::DispatchSignals(ESignals) ()
from /cluster/ATLAS/osg/app/atlas_app/atlas_rel/15.6.14/sw/lcg/app/releases/R
OOT/5.22.00j/i686-slc5-gcc43-opt/root/lib/libCore.so
#7 0xf7aaa58d in SigHandler(ESignals) ()
from /cluster/ATLAS/osg/app/atlas_app/atlas_rel/15.6.14/sw/lcg/app/releases/R
OOT/5.22.00j/i686-slc5-gcc43-opt/root/lib/libCore.so
#8 0xf7aa0e42 in sighandler(int) ()
from /cluster/ATLAS/osg/app/atlas_app/atlas_rel/15.6.14/sw/lcg/app/releases/R
OOT/5.22.00j/i686-slc5-gcc43-opt/root/lib/libCore.so
#9
#10 0xf5ce7cc5 in dilep ()
from /cluster/tufts/physicshe/shamil01/TopRootWork/KrzFolder/FailingCode/./li
bTopFortran.so
#11 0x20202020 in ?? ()

One thing that I’ve observed is that the shared library for TopFort.cc created on the atlas machine is much smaller than on the cdf machine…here are the sizes:

atlas: 30146 (Size)

cdf: 2481021 (Size)

Other than this, I have no idea what is wrong. I’ll attach the code that is used incase it is helpful.
TopFort.cc.C (1.03 KB)
ellipse_pdflib.F.C (38.5 KB)
testing.C (1.52 KB)

Pepe_Le_Pew · February 6, 2011, 10:43am

VIVE L’AMOUR!
I have no solution for you, but you might try to make sure that “integer” is “integer4" and “real” is "real4” on both of your systems / compilers.
Compile and run the attached “test_digits.f”.

You should get this output:

          31             4             0
          31             4             0
          63             8             0
          24             4             0
          24             4             0
          53             8             0

“The extension f is not allowed.”

I had to change the name of the file into “test_digits.f.txt”.

A pitiful case, am I not?
Pepe Le Pew.
test_digits.f.txt (931 Bytes)

Russell_Leslie · February 7, 2011, 8:39am

Greetings,

The problem is probably caused by the differing underlying array formats of Fortran and C++.

Fortran arrays are “column-first” while C++ arrays are “row-first”.

I have never figured out a good way to directly pass a non-trivial array between Fortran and C++. I get around the problem by outputting the array to an intermediate text file and then reading it back into the other program that way.

Hope this helps

Russell

Russell_Leslie · February 7, 2011, 9:13am

There is a guide here that might help

http://arnholm.org/software/cppf77/cppf77.htm#Section3.5.2

Pepe_Le_Pew · February 7, 2011, 8:15pm

VIVE L’AMOUR!
the original source code does NOT use multidimensional arrays, so there can be NO problem related to “ordering”.
The only problem that may be there is related to default “sizes of types” in C/C++ versus FORTRAN (that’s what I tried to address in my previous post here).
The whole “ordering” thing is pretty much simple, actually.
A C/C++ “array[i][j][k][l][m][n]” will be seen in FORTRAN as an “array(n,m,l,k,j,i)”, and vice versa. Note also that in C/C++ you (usually) access array elements using indexes from “0” to “i-1”, “j-1”, “k-1”, “l-1”, “m-1”, “n-1”, while in FORTRAN they would (usually) be from “1” to “n”, “m”, “l”, “k”, “j”, “i”. And that was it. <img src="/uploads/default/original/2X/8/84c2fe9464a4066c00e1bd5978e913e7869cbb07.gif" width=“22” height=“16” alt=":-"" title=“Whistle”/>
See also here: http://en.wikipedia.org/wiki/Row-major_order
I am stupid. No?
Pepe Le Pew.

samhamil · February 8, 2011, 9:46am

Pepe,

I ran your code and received the same results.

Pepe_Le_Pew · February 8, 2011, 10:57am

VIVE L’AMOUR!
try one more thing … compile all your fortran source code using “gfortran -ff2c” … maybe it helps a bit.
I am stupid. No?
Pepe Le Pew.

tpochep · February 8, 2011, 11:30am

[quote=“samhamil”]Pepe,

I ran your code and received the same results.[/quote]

Can you, please, give the declaration of your dilep? You forgot to attach TopFort.h.

P.S.

Neither g77, nor gfortran can compile your fortran code.

g77 -v
Reading specs from /usr/lib/gcc/x86_64-redhat-linux/3.4.6/specs
Configured with: …/configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --disable-checking --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-languages=c,c++,f77 --disable-libgcj --host=x86_64-redhat-linux
Thread model: posix
gcc version 3.4.6 20060404 (Red Hat 3.4.6-4.1)

gfortran -v
Using built-in specs.
Target: x86_64-redhat-linux
Configured with: …/configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-libgcj-multifile --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --disable-plugin --with-java-home=/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre --with-cpu=generic --host=x86_64-redhat-linux
Thread model: posix
gcc version 4.1.2 20080704 (Red Hat 4.1.2-50)

Pepe_Le_Pew · February 8, 2011, 12:46pm

VIVE L’AMOUR!
sorry, I have probably misled you in my previous post.
On second thought, I don’t believe “-ff2c” will really help (but you might try it anyhow).
I believe your CERNLIB binaries may be following this convention (especially if they were compiled with g77), but your gfortran’s intrinsic functions do not (they use the “-fno-f2c” calling conventions). So, if this is the case then, when you link your shared library (with gfortran’s runtime libraries) you’ll get problems.

The possible solution for you would be to compile CERNLIB on your x86_64 machine with gfortran (afterwards, use the same fortran compiler flags for your own source code).
This can be done in 9 simple steps.

There is a web page: http://www-zeuthen.desy.de/linear_collider/cernlib/new/cernlib_2005.html

take the “original file”: http://www-zeuthen.desy.de/linear_collider/cernlib/new/cernlib-2005-all-new.tgz
unpack it into a subdirectory like “/cern/”
take the newest “patch file”: http://www-zeuthen.desy.de/linear_collider/cernlib/new/cernlib.2005.corr.2010.08.01.tgz
rename it into “cernlib.2005.corr.tgz”
copy it into the same subdirectory where you unpacked the “original file” (do NOT unpack it, it will replace an older version of this file which is already there)
take: http://www-zeuthen.desy.de/linear_collider/cernlib/new/cernlib.2005.install.2010.08.01.tgz
unpack it into the same subdirectory where you unpacked the “original file” (it will replace older versions of the relevant shell scripts and the “README_cernlib” file)
see the “Installation:” and “Notes:” sections in the file “README_cernlib” which is coming from the last unpacked file for more explanations
have fun

A pitiful case, am I not?
Pepe Le Pew.

P.S. As already mentioned, you do not give us the “TopFort.h” which probably keeps the C/C++ declaration of “dilep_”. I assume it is ‘extern “C” { extern void dilep_(float p_ele[], float b0ele[], float p_muo[], float b0muo[], float output[]); }’. <img src="/uploads/default/original/2X/8/84c2fe9464a4066c00e1bd5978e913e7869cbb07.gif" width=“22” height=“16” alt=":-"" title=“Whistle”/>

samhamil · February 8, 2011, 3:34pm

Thanks for all the help. I still haven’t nailed down my problem yet, but will try your suggestions in the coming days. At the moment I’m working on something else more pressing.

samhamil · February 8, 2011, 3:38pm

I have included the header here…I do not call it as you mention, i call it as:

void dilep_( float *, float *, float *, float *, float * );
TopFort.h.txt (432 Bytes)

tpochep · February 8, 2011, 4:09pm

What about fortran code you posted here? It’s not compilable

samhamil · February 9, 2011, 8:44am

There are a ton of include files which i haven’t posted

tpochep · February 9, 2011, 8:49am

That’s bad

daid · February 9, 2011, 9:17am

Since you are dealing with passing an array, I’ll tell you how I do it.

Let’s say you have a 32 character array in Fortran. For example:

For your extern declare in C++, tell it it’s a 33 sized array.

now when you declare it, make it a 33 character array, let’s say with something in it:

Now you need a loop something like this:

//The matter name must be 32 characters plus the termination charcater \0 or Fortran will not read it correctly
//The for loop takes the input matter name and appends the approrpiate number of spaces.
for(int i=0;i<33;i++)
if(matter1[i]=='\0')
        {  
                matter1[i]=' ';
                matter1[i+1]='\0';
        }

Of course it depends what kind of array. Sorry I think I deleted most my own logfiles for this. But anyhow, I have fortran subroutines working with C++ that calls root compiled by g++. Anyway, maybe it’s useful for you.

Pepe_Le_Pew · February 9, 2011, 10:41am

VIVE L’AMOUR!
there is one more thing that nobody mentioned up to now …
From your original post, I can see that you use the “-m32” flag when compiling your fortran code on your x86_64 machine. This generates x86-32 compatible object code.
Also the subdirectory name “…/i686-slc5-gcc43-opt/…” suggests (i.e. “i686” in this name) that the CERNLIB that you use on your x86_64 machine was compiled for x86-32 (but you should make sure about it, of course).
Do you also use a ROOT version which has been compiled with “-m32”?
That is really crucial (i.e. all gcc / g++ calls should have contained the “-m32” flag, too). <img src="/uploads/default/original/2X/8/84c2fe9464a4066c00e1bd5978e913e7869cbb07.gif" width=“22” height=“16” alt=":-"" title=“Whistle”/>
Please try to type “sizeof(float*)” on your ROOT prompt. If you get “4”, then it’s x86-32, but if you get “8”, then it’s an x86-64 version. In the latter case, pointers are 64-bit wide, while your fortran code expects them to be 32-bit wide and then the fortran code accesses totally improper places in RAM.
Note here that, all fortran parameters are “passed by reference”, i.e. this is a general problem in this case, which will appear for any kind of a parameter, not only for “arrays”.
If your current ROOT is x86-64, then you either need to switch to a x86-32 ROOT version, or you need to compile CERNLIB “from scratch” according to the prescription given in my previous post here (which will be done without “-m32”, as far as I’m aware).
I am stupid. No?
Pepe Le Pew.

samhamil · February 14, 2011, 1:34pm

I just typed in sizeof in root:

root [0] sizeof(float*)
(const int)4

My advisor has been able to by pass this problem for now, but I will need to return to it eventually.

wbell · February 14, 2011, 8:19pm

Hi,

There is a simple “anything you can think of” FORTRAN77 to C++ example documented at:
wbell.web.cern.ch/wbell/HepCppIntro/

Look at example 21 and section 5.1. You can find character arrays, multiple dimension arrays etc. simply implemented in FORTRAN77 and C++.

Regards,

Will