Making predictions with a Keras model inside TMVA?

ddroz · July 18, 2017, 11:58am

Dear ROOTers,

I’ve been using Python/Keras to train and evaluate a neural network. Now that the model is fully trained, I want to use it to make predictions.

I can do that in Python with the usual Keras methods. However, for compatibility with my colleagues and my experiment framework, I would like to do that in ROOT. According to this 2016 talk by S. Gleizer, L Moneta, there is a Keras-TMVA interface. But I cannot find any documentation.

Ideally, I would like a way to load my Keras model, supply it variables and get classification score, inside a C++/ROOT script.

If possible, I would also like to apply a PCA reduction on my data, where the PCA basis was computed with scikit-learn. I can skip that step, at the cost of final accuracy though.

Thank you for your help!

kialbert · July 18, 2017, 12:29pm

Hi,

There is some documentation to be found in the latest version of the TMVA User’s Guide here.

In summary it is an interface so that your training can take place inside of TMVA. You define your Keras model in python and save this to disk e.g., saved-model.h5. This can then be loaded and used as exemplified here. That example uses the python bindings for ROOT/TMVA, but the conversion to a C++ script should be straight forward.

Perhaps @sergei can offer some additional information?

ddroz · July 18, 2017, 12:36pm

Thanks for the link for the latest User’s Guide! Will prove very useful.

If TMVA works by loading the untrained/compiled model from a h5 file, I suppose I could also save the trained model as a h5 file then load it inside TMVA and skip the training, directly doing predictions, right?

kialbert · July 18, 2017, 12:48pm

No probs!

That should in theory work, but I think you have to be careful with how you set up your input variables in that case.

Happy coding!

swunsch · July 18, 2017, 2:24pm

Hi, I’ve done the interface and yes, this should work. There are some small design decision you have to take care of:

A neural net for binary classification needs two output nodes, such as a multiclass network with N classes needs N output nodes. Sigmoid activation and a single output node does not work in Classification mode, that would be a Regression here.
You are restricted to Keras 1.x, such as shipped with CVMFS at this time. Though, the fallbacks in Keras 2.0 could make it possible to use the new API (not yet tested).

Some examples are given in the folder /tutorials/tmva/keras. Now, the only problem is that the interface is designed to train the loaded model first following the TMVA workflow. Though, you can set in the model.compile(...) step of your model the learning rate to zero or something similar and train only 1 epoch. Then, you can use the application interface and the preprocessing performed by TMVA. It’s a small hack because the TMVA workflow is not application-only. As well, you can train for sth like 1 epoch a net with the same architecture and modify in the weights*.xml file the path to the trained model. This would work out as well, and you can use the TMVA reader in a C++ script.

Here’s a link to a Keras/TMVA/lwtnn tutorial I’ve given earlier this year. The second part covers the TMVA/Keras interface, and probably you are interested in the third part, which explain how to speed up the inference with lwtnn:

https://github.com/stwunsch/iml_keras_workshop/blob/master/slides/slides_iml_keras_workshop.pdf

Cheers!

ddroz · July 18, 2017, 2:51pm

Thank you for all the details and tips. I’ll have to look deeper into how I implement that, but this is very useful!