Hi ROOT experts,
I encountered the following issue when training TMVA CNN. At some point, both training and validation loss suddenly become very small, which then leads to a crash:
If necessary, I can try to prepare some data and code which can reproduce the issue above. It’ll also be great if you can give me some idea how to figure out what is happening. Thanks in advance!
I can’t tell what’s happening here from the output alone, but it’d be great if you can show us how to reproduce this issue and I can try to dig into this/refer to relevant expert.
Thanks for the reply. We’ve figured out that the problem is we were running out of GPU memory, which led to the crash. I think it would be better if ROOT can detect this case and raise an exception, so users can have an idea what is going wrong?
Thank you for the suggestion. We will try to catch these error and print a meaningful message.
I hope without the GPU memory problem, it works fine, if not, please let us know