Hi,
I will try to check with the new ububuntu and see if I can reproduce the error.
For the implementation of new operators, we support Transformer models, like the ATLAS_GN2 model or CaloDiT2 for diffusion, but the Transformer block (the Attention layer) is impelmented as a series of several ONNX operators. We could implement directly the MultiHead Attention, if needed.
Normally, when exporting from PyTorch to ONNX you get a series of lower level ONNX operators.
LayerNormalization is supported, but probably in 6.38. GELU can also easly be added if needed.
Yes, sure adding new operators could be a suitable GSOC project, you are welcomed to apply for it.
Thank you for the interest!
Best,
Lorenzo