FPE in SOFIE-generated inference function

Dave_Brown · June 5, 2023, 10:44pm

I see occasional FPE invoking the SOFIE-generated infer function in my application with overflow trapping enabled. Looking in detail at 1 instance, the input values are valid and physically sensible and consistent with the values used to train (in my case Keras). The infer function returns a sensible value in this case (between 0 and 1). The stack trace is below. Has anyone else seen this behavior? Does anyone have advice how to avoid it (besides disabling the trap)?

Thread 1 “mu2e” received signal SIGFPE, Arithmetic exception.

0x00007fffbef1a252 in sgemm_kernel_HASWELL () from /cvmfs/mu2e.opensciencegrid.org/artexternals/openblas/v0_3_23/Linux64bit+3.10-2.17-e20/lib/libopenblas.so.0

(gdb) where 10

#0 0x00007fffbef1a252 in sgemm_kernel_HASWELL () from /cvmfs/mu2e.opensciencegrid.org/artexternals/openblas/v0_3_23/Linux64bit+3.10-2.17-e20/lib/libopenblas.so.0

#1 0x00007fffbcf59f06 in sgemm_nn () from /cvmfs/mu2e.opensciencegrid.org/artexternals/openblas/v0_3_23/Linux64bit+3.10-2.17-e20/lib/libopenblas.so.0

#2 0x00007fffbce923bf in sgemm_ () from /cvmfs/mu2e.opensciencegrid.org/artexternals/openblas/v0_3_23/Linux64bit+3.10-2.17-e20/lib/libopenblas.so.0

#3 0x00007fffc0ae0d68 in TMVA_SOFIE_TrainBkg::Session::infer (tensor_input1=0x7ffffffed4d0, this=0x4d24b10) at ./Offline/Mu2eKinKal/inc/TrainBkg.hxx:196

moneta · June 6, 2023, 2:31pm

Hi Dave,

This is strange, it should not happen if the input is fine and you are sure that is not causing an overflow, for example multiplying in BLAS two very large numbers.

Can you maybe share the input data and the model, so I can try to reproduce the problem ?

Cheers

Lorenzo