When TMVA comput loss,Some Eopch's loss is INF

yhao · November 7, 2023, 10:56am

Dear users,
When I train a DNN model with TMVA in ROOT6.20,I found out some epoch’s Train Err and Val Err are inf.I don’t understand why this is the case

Below are the parameters and output of the model

Thanks a lot!

moneta · November 14, 2023, 5:13pm

Hi,
Apologies for the late reply.

This is caused probably by a loss function becoming “inf”, probably because of some crazy weight values. It should not be a problem in an iteration, if later these errors converge to some finite values.

Lorenzo

yhao · November 17, 2023, 1:24pm

Deae moneta:
Thank you for your reply！I found that the training error becomes INF because in TMVA, when computing the Loss, for some instances, the output values from the network are too large, resulting in the Loss being INF.
Perhaps you can add a small value ε when computing the loss, so that the calculation result will not become INF.
AReal sig = 1.0 / (1.0 + std::exp(-output(i,j)));
if(sig==0) sig=sig+ε;
else if(sig==1) sig=sig-ε；
result += w * (Y(i, j) * std::log(sig) + (1.0 - Y(i, j)) * std::log(1.0 - sig))

moneta · November 17, 2023, 2:15pm

Hello

Thank you for the suggestion. You are right this can be improved for large positive and negative output, using the approximation that log(1 + exp(x)) ~ x for large positive x and ~ exp(x) for large negative x.
I will improve the implementation of the cross entropy functions for these for these two cases

Lorenzo