Dear ROOT team,
I am currently working on a TMVA regression training with a DNN. When running the DNN with Architecture=CPU
, I get excellent performance (in terms of prediction quality). However, it takes several hours to train the network on a 6 core machine. So I have recompiled ROOT with -Dcuda=on for better run-time performance. I am using a Geforce GTX 1060 card. And indeed, the training is super fast (takes only 90 seconds). BUT: Using Architecture=GPU
, the result is much worse in terms on prediction quality (i.e. not good at all).
ROOT version is 6.14.06
You can already see the difference in the output. For CPU, everthing looks fine:
: Start of neural network training on CPU.
:
: Training phase 1 of 2:
: Epoch | Train Err. Test Err. GFLOP/s Conv. Steps
: --------------------------------------------------------------
: 7 | 0.00844188 0.00835884 11.5233 0
: 14 | 0.00771477 0.0076277 11.4086 0
: 21 | 0.00719746 0.00711652 11.5553 0
: 28 | 0.00695399 0.00686811 11.6709 0
: 35 | 0.00677531 0.0067119 11.3855 0
: 42 | 0.00663051 0.006569 11.0283 0
[...]
: 1330 | 0.00461922 0.00482107 11.2672 77
: 1337 | 0.00460131 0.00480657 11.778 84
: 1344 | 0.00464281 0.00484956 11.8991 91
: 1351 | 0.00464017 0.00484474 11.5172 98
: 1358 | 0.00460107 0.00480589 11.3244 105
For GPU, we have negative values (?!) for Train Err. and Test Err. and it seems the number of conversion steps is already high in the beginning. Could it be there is some sort of abs
missing somewhere? Also, instead of 1358 epochs for phase 1, there are only 168 when using GPU:
: Training phase 1 of 2:
: Epoch | Train Err. Test Err. GFLOP/s Conv. Steps
: --------------------------------------------------------------
: 7 | -0.0519101 -0.0513511 272.042 0
: 14 | -0.0518897 -0.0512095 278.155 7
: 21 | -0.0519059 -0.0509986 277.483 14
: 28 | -0.0518894 -0.0508399 277.738 21
: 35 | -0.0518891 -0.0503983 277.846 28
: 42 | -0.0518917 -0.0513722 277.627 0
: 49 | -0.0518792 -0.0513415 277.956 0
: 56 | -0.0519018 -0.0513527 277.596 0
: 63 | -0.0518969 -0.0518492 277.697 0
: 70 | -0.0519031 -0.0506192 277.909 7
: 77 | -0.0518938 -0.0512583 277.788 14
: 84 | -0.0519034 -0.05101 277.565 21
: 91 | -0.0518935 -0.0514375 277.888 28
: 98 | -0.0518891 -0.0510831 277.701 35
: 105 | -0.0518903 -0.0514737 277.602 42
: 112 | -0.0518993 -0.0512396 277.284 49
: 119 | -0.0519075 -0.0508297 276.154 56
: 126 | -0.051904 -0.0514084 276.909 63
: 133 | -0.0518718 -0.0509068 276.187 70
: 140 | -0.0518997 -0.0512256 276.347 77
: 147 | -0.0519123 -0.0517199 276.478 84
: 154 | -0.0518909 -0.0515149 275.244 91
: 161 | -0.0518925 -0.0511171 277.766 98
: 168 | -0.0518953 -0.0516251 277.442 105
In case it is relevant: I have compiled ROOT/Cuda with g++ 7 (cxx14=on, python=3) while Cuda officially only supports g++ <= 6. I did so by removing the version check in the cuda header. When compiling ROOT, a few “unused parameter” warnings appeared in the some TMVA/DNN files.