Understanding TMVA output for regression using keras with theano backend

Devin · June 4, 2019, 3:43pm

I’m using keras with a theano backend on lxplus, and I’m having trouble understanding the output histograms from a regression problem or even if it’s working properly. I’ve attached my python script and a plot I discuss later.

Basically, I feed it 12 input variables, pass it through one hidden layer with 8 nodes (with relu activation), and read out one output node (linear activation) to predict a 13th variable. This 13th variable—time, in my case— is approximately normally distributed around 0.

I’m able to run without errors on my dataset, and it gives me some evaluation results showing an RMS for the test sample, which I believe should be the RMS for difference between the regression value and the true value. This RMS is significantly lower than the input RMS, so it seems like it’s doing its job.

However, when I try to look through the output plots, they’re a bit hard to decipher, and the documentation doesn’t have much on regression (mostly classification and regression is a side note).

The most confusing one is the plot showing the correlation between (regression - true) vs. true. If the regression is working, shouldn’t this relationship be flat? With my data, it’s a tight distribution with a slope of -1 (see attached plot). The RMS is the same as that quoted in the output on the command line.

Why am I getting such a distribution? Is it set up properly?

timingRegressionKeras.py (1.9 KB)
Screen Shot 2019-06-04 at 5.19.25 PM.pdf (141.3 KB)

couet · June 5, 2019, 7:06am

I think @moneta can help you.

Devin · June 11, 2019, 12:51pm

Anyone have any thoughts? I’m still confused about how to interpret these results. @moneta maybe?

kialbert · June 14, 2019, 10:37am

Hi,

Indeed as you say the correlation plot output-truth vs truth should be flat if the regression is working. A tight slope of -1 is consistent with the output being always 0… Maybe posting the output log will give some more insight of what is going on?

Also maybe @swunsch can enlighten us?

Cheers,
Kim

Devin · June 14, 2019, 10:58am

Hi Kim,

If the output were always 0, wouldn’t the slope be exactly -1? Here, there’s apparently some spread, so it seems like it’s doing something…

I’ve attached an output log (note: this might have slightly different settings, but the results are qualitatively the same).

Best,
Devin

TMVAOutput.txt (39.0 KB)

kialbert · June 14, 2019, 11:23am

Hi,

Sure! I only meant the flatter the better

I’ll try taking a look at the log ”soon”…

Cheers,
Kim

Chinmay · February 10, 2023, 9:48am

Hi, If I have more than one variable that is added by using Addvariable, then how many target variable I have to add and how?