I’m using keras with a theano backend on lxplus, and I’m having trouble understanding the output histograms from a regression problem or even if it’s working properly. I’ve attached my python script and a plot I discuss later.
Basically, I feed it 12 input variables, pass it through one hidden layer with 8 nodes (with relu activation), and read out one output node (linear activation) to predict a 13th variable. This 13th variable—time, in my case— is approximately normally distributed around 0.
I’m able to run without errors on my dataset, and it gives me some evaluation results showing an RMS for the test sample, which I believe should be the RMS for difference between the regression value and the true value. This RMS is significantly lower than the input RMS, so it seems like it’s doing its job.
However, when I try to look through the output plots, they’re a bit hard to decipher, and the documentation doesn’t have much on regression (mostly classification and regression is a side note).
The most confusing one is the plot showing the correlation between (regression - true) vs. true. If the regression is working, shouldn’t this relationship be flat? With my data, it’s a tight distribution with a slope of -1 (see attached plot). The RMS is the same as that quoted in the output on the command line.
Why am I getting such a distribution? Is it set up properly?
timingRegressionKeras.py (1.9 KB)
Screen Shot 2019-06-04 at 5.19.25 PM.pdf (141.3 KB)