I have been observing some interesting crashes for a typical pyROOT analysis code based on RDataFrame.
The jobs are being run on a local condor system, and I consistently observe a “*** Break *** illegal instruction” error (see attachment below) in the beginning of job execution on a set of somewhat older nodes, with CPU Intel Xeon model E5620 @ 2.40GHz (or similar). There seems to be no issue when running over nodes with newer CPUs.
The environment is set via “source /cvmfs/sft.cern.ch/lcg/views/LCG_97python3/x86_64-centos7-gcc8-opt/setup.sh”, i.e. ROOT 6.20/02.
Upon investigation, the local IT expert suggested this might have something to do with AVX in ROOT, and lack of AVX support in these older CPUs. Could you please comment on this? I havent been able to find out much about this from the ROOT end, nor how to avoid such failures in the future.
Thanks in advance!
condor_job_error.txt (73.1 KB)
Last interesting stack frame 5 of the backtrace (
register_blosc function at
hdf5-blosc/src/blosc_filter.c:61). This line happens to be a
strdup() call. This function relies on a memory copy, which on glibc may be accelerated by means of vector instructions.
Therefore, your local IT expert is probably right. IIRC, the implementation of
memcpy() that is called depends on the
-march= compiler flag.
I am inviting @etejedor to this thread to discuss whether we should create a GitHub issue to track this problem.
From what I see, the failure seems to come from the Python interpreter:
#8 0x00002b082744ceda in PyModule_ExecDef (module=0x2b08596c14d0, def=<optimized out>) at /workspace/build/externals/Python-3.7.6/src/Python/3.7.6/Objects/moduleobject.c:414
So it might not be related to RDataFrame or even ROOT itself.
It might be good to narrow down the piece of code that is triggering this, or even try with a different Python version. Just creating an RDataFrame causes this crash? Is it enough to import ROOT?