Inference with TMVA::Reader is very slow when using DNN

Dear TMVA experts,

I trained both a BDT and a DNN (using the PyKeras method) for binary classification with the TMVA::Factory class. The time it took for training and evaluation was reasonable for both methods (a few minutes for training and ~5-10 seconds for the evaluation).
However when using them in our analysis with the TMVA::Reader class, inference using the DNN seems to be much slower than with the BDT: running the analysis on a subset of the MC-samples takes 3 minutes when using the BDT but ~3.5h when using the DNN. Most of that time is spend running the EvaluateMVA method of the TMVA::Reader object.

Is that to be expected or are we doing something wrong? We are using Root 6.24 on Red Hat Enterprise Linux 7.9

Any help would be much appreciated.

@moneta could you have a look here?

This is expected because you are evaluating a single event with the TMVA Reader class, and you need to call a Python function for every event.
If you instead you will use the native TMVA DNN it will be fast, since it is all coded in C++.
The solution is to use the new TMVA SOFIE for evaluating the model. If you have only a dense network you can directly use as input the Keras h5 file, otherwise you need to export your model to ONNX using for example tf2onnx.
See the SOFIE tutorial, TMVA_SOFIE_Keras.C or TMVA_SOFIE_Keras_HiggsModel.C to generate the code for evaluating the model and TMVA_SOFIE_RDataFrame.C to evaluate the model with the RDataFrame.



thank you for your help.
If I understand correctly, SOFIE::PyKeras::Parse currently does not support BatchNormalization and Dropout layers, which we unfortunatelly both need. Therefore I tried first converting the model to ONNX (which worked fine) and then use SOFIE::RModelParser_ONNX to parse it. However it seems like our model uses the operator Mul which also is not yet supported.

We used the following Python code to generate the Keras model:

model = keras.models.Sequential()
for i in range(nHiddenLayers):
    model.add(keras.layers.Dense(nNodesPerLayer, kernel_initializer="he_normal"))
    model.add(keras.layers.Dense(2,activation="softmax", kernel_initializer="glorot_uniform"))

Is there an easy way to check which part of our model results in the need for the Mul-operator so that we can see if we can work around that? Thank you in advance!

I think the Mul operator comes from the BatchNormalization layer. I have seen already ONNX models obtained from Keras that instead of the BatchNormalization is saved as a Mul and an Add operator.
The Mul operator is going to be supported soon, we will have a PR in the master for this in the next days. Sometimes you can also simplify the obtained ONNX model using the onyx-simplifier tool , see GitHub - daquexian/onnx-simplifier: Simplify your onnx model.
If you share a link to your obtained ONNX model, I can have a look at it

Best regards


you are right. I looked at the ONNX graph using and indeed, the BatchNormalization layer is expressed using Add and Mul. Unfortunatelly the simplification did not change that. I will send you a PM with a link to the model.

Best regards