Error calling Acts through Dictionary

Dear root expert,

we have seen a very weird thing happening when trying to build a ROOT dictionary with calls to Acts inside. We are only able to make it work properly when compiling acts with x86_64 the broadwell spack install was giving us very convoluted errors, with as an example the last line in the stack trace before it goes into ROOT itself:

#6 0x00007f5e193657e5 in _mm256_store_pd (__A=…, __P=0x53967f0) at /cvmfs/sw.hsf.org/spackages/linux-centos7-haswell/gcc-4.8.5/gcc-8.3.0-avsmzt7bekq7ispf6zlarx6vwdretbae/lib/gcc/x86_64-pc-linux-gnu/8.3.0/include/avxintrin.h:867

which is called by an Eigen constructor. Here it appears to be calling an AVX intrinsic. While this probably would result in an instruction error, rather than a stack trace: are you maybe compiling different pieces differently? Or could it be that the statically compiled code uses/doesn’t use AVX, and the code called through ROOT does the opposite? Maybe this causes a memory inconsistency that sometimes occurs and sometimes doesn’t, depending on memory location and/or alignment?

We are asking some help to understand this issue.

to reproduce the problem:

git clone https://github.com/HEP-FCC/FCCAnalyses.git
cd FCCAnalyses
source /cvmfs/fcc.cern.ch/sw/latest/setup.sh
export PYTHONPATH=$PWD:$PYTHONPATH
export LD_LIBRARY_PATH=$PWD/install/lib:$LD_LIBRARY_PATH
export ROOT_INCLUDE_PATH=$PWD/install/include/FCCAnalyses:$ROOT_INCLUDE_PATH

spack load acts@5.0.0

mkdir build install
cd build/
cmake .. -DCMAKE_INSTALL_PREFIX=../install
make install
cd ..

root -l
.L examples/FCCee/vertex/reproducer.cc
reproducer()

If you want to see it running, change to

spack load acts@5.0.0 arch=linux-centos7-x86_64

and recompile.
Cheers and thanks,
Clement


_ROOT Version: root-6.22.06
_Platform: centos7
_Compiler: gcc-8.3.0


And of course please let us know if any other informations could be provided.

Hi @clementhelsens; let me try to reproduce the problem this evening. I will reach back to you after that.

Cheers.

Hello @jalopezg ,

did you get a chance to reproduce the problem?

Cheers,
Clement

Hi @clementhelsens,

Sorry for the delay, I had this in the TODO list. No, I couldn’t reproduce it with the acts@5.00.0%gcc@8.3.0 arch=linux-centos7-broadwell spack package on an Intel® Xeon® Platinum 8160 CPU, i.e.

$ spack load --first acts@5.00.0%gcc@8.3.0 arch=linux-centos7-broadwell
$ # ...
$ root -l
root [0] .L examples/FCCee/vertex/reproducer.cc
root [1]

Are you running on a node on LXPLUS? Could you please attach here the output of $ cat /proc/cpuinfo and the backtrace that you obtained?

Cheers,
J.

No problem, but the crash occurs at runtime. Have you also executed reproducer()?
To answer your question, I am running from lxplus centos 7.
Tomorrow I will attach what you are asking.

thanks,
Clement

Yes, I also ran the macro. Sorry -I didn’t copied the reproducer() line here.

Cheers.

Hi @jalopezg,

I think with spack load --first acts@5.00.0%gcc@8.3.0 arch=linux-centos7-broadwell you might have selected the Debug build. We saw that this works fine too, as I think the offending avx features are also disabled when disabling the compiler optimisations. Sorry about that I only added it after Clement posted his instructions. Could you try with

spack load --first acts@5.00.0%gcc@8.3.0 build_type=Release arch=linux-centos7-broadwell

?
Cheers!
Valentin

Just to note that the code in the reproducer is also run in an acts unittest, that runs fine with all our installations. It seems like it is only the combination of ROOT dictionary + avx instructions that leads to the segfault.

Thanks, @vavolkl! I managed to reproduce the failure with the Release build. I will be investigating it in the next few days. I took this Friday off, so expect a response next week. :slight_smile:

Thanks for reporting!
Cheers.

thanks @jalopezg , and sorry I did not re-checked the instructions I provided after @vavolkl updated the default build.
Cheers and thanks for helping!
Clement

Dear @jalopezg, a ping so that the topic does not get closed.
Cheers,
Clement

Hi @clementhelsens,

Yes, sorry for the delay. I had this on the TODO list. The bad news is that my backtrace appears truncated using the Release build of acts-5.0.0. Could you please attach the original backtrace that you got?

I will be off next week and the rest of this week, but the good news is that I promise to take a look into this as soon as I am back. Again, sorry for the delay in handling this issue!

Cheers,
J.

Hi @clementhelsens @jalopezg,

So I haven’t completely understood this issue yet, but at least I have a workaround that doesn’t force us to disable avx everywhere.

So I was coming back to this after stumbling on a discussion of EXTRA_CLING_ARGS and avx, but disappointingly adding an -O3 -mavx there has no effect. Then I noticed that taking FCCAnalyses from the usual arch=broadwell stack, the reproducer runs fine - it is actually only when building FCCAnalyses locally that the segfault appears. So apparently the reproducer either needs -mavx nowhere (like in the debug builds) or everywhere (like in the release builds in the stack), but taking dependencies compiled with avx and then compiling FCCAnalyses without gives the error.

The simple workaround is thus enabling avx in the local build, by adding something like

set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -march=native")

in the top level CMakeLists.txt Clement, I think if we add an option for this it should be ok, no? Alternatively, installing fccanalyses locally using spack

# in the repository
spack dev-build fccanalyses@master.01
spack load fccanalyses@master.01 

should add the flags for broadwell and thus also work. I’d still be quite interested why this results in a segfault, which is not typical for instruction errors like this…

1 Like

Hello @vavolkl,

seems you found the issue, this is great! I do not see any reason not to add avx locally as suggested. I will give it a try locally as well not using spack load acts@5.0.0 arch=linux-centos7-x86_64 but the broadwell one.

thanks a lot!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.