[DNN] Train/Test Error = nan when using ReLU

a.bragagnolo · October 11, 2017, 9:52am

Hi,
I’m trying to use the TMVA DNN method as classificator. I have no problem using the old good TANH activation function but nearly everytime I try using the RELU activator I get ‘nan’ both as Test Err. and Train Err. after a few epochs.

When using CROSSENTROPY as error strategy I presume this is caused by RELU returning 0 or 1, both values that cannot be used as argument for the logarithms inside the cross entropy calculation.
But I tried using SUMOFSQUARES and the problem remains. In fact, I cannot train any configuration DNN with RELU without getting nan as error.

Someone know a stable workaround for using RELU in TMVA?

moneta · October 11, 2017, 10:47am

Hi,

Can you please post a macro and a minimal data set reproducing this problem? I need to reproduce it and see if it is a problem in the configuration

Thank you

Lorenzo

a.bragagnolo · October 11, 2017, 12:18pm

Sure,

here is my macro

and here is a part of my dataset

Alberto

moneta · November 3, 2017, 2:41pm

Hi,

I am sorry for my later reply. Th problem is that you can’t use a RELU for the last, output layer. You need a tan or a sigmoid function.
This is the problem causing the NaN. We need to add a warning in TMVA for this.
If you still have problems after changing this, please let me know

Best Regards

Lorenzo

a.bragagnolo · November 6, 2017, 9:53am

Hi Lorenzo,
I don’t think that is the problem since I never use anything but LINEAR for the last layer. For example my working layout is

while the following configuration does not work does not work

omazapa · November 7, 2017, 8:27am

Hi

I try that configuration(both) and it works,
which version of ROOT and which OS are using?
Cheers,
Omar.

a.bragagnolo · November 7, 2017, 8:58am

Hi Omar,
I’m using ROOT 6.10/08* on Ubuntu 16.04 LTS

*with this(**) hotfix

(**) https://root-forum.cern.ch/t/trouble-compiling-root-with-cuda-support-for-tmva/25900/7

I get this when I try to run the expample I gave you