Setting validation_split in TMVA DNN

physmlee · January 28, 2020, 5:59am

Hello, everyone.
I’m new to TMVA. I met a problem when trying MNIST multiclassification in pyTMVA.

My purpose is to mimic the well-known MNIST classification code in pyTMVA.
(https://colab.research.google.com/github/AviatorMoser/keras-mnist-tutorial/blob/master/MNIST%20in%20Keras.ipynb#scrollTo=nHLN8vcb6rws)
It has 60K events for training and 10K events for testing.
Thanks to the good compatibility, I could build the model easily. Now I got about 97.8% validation accuracy.

The only problem left is the validation set!
I want to train my network using all the 60K training events, without validation error check.
And test my network after all the training epochs end, using 10K testing events.
However, my TMVA automatically splits my training events into 48K training set and 12K validation set.

How can I deal with this problem?
Is there any option to set the validation fraction, so that I can give more the training event number?

There was, for TMlpANN, but not for DNN, according to the Users Guide…

jblomer · January 28, 2020, 8:13am

@moneta Perhaps you can help?

moneta · January 28, 2020, 8:42am

Hi

You can use the option ValidationSize=value when booking the DL Method, where the value can be the absolute number of validation events or the fraction. Unfortunately you cannot give a zero value, you can use as minimum number the batch size.

But, let me understand, would you like to train using all training data and stop the training given just the number of epoch for example and not using the validation error, correct ? Or would you like to use the testing data as validation data ?

Best regards

Lorenzo

physmlee · January 28, 2020, 11:45am

Thank you, dear Lorenzo!
It worked well!

The Xavier’s code in the above link is using all 60K training data, and stop the training given just the number of epoch.
As practice, I just wanted to mimic that code.

Now, with ValidationSize=1 (so that it trains with 59,999 samples), my code gives exactly the same result with Xavier’s code!
And I can adjust the validation sample number!
Thank you!

Best regards
Seungmok Lee

p.s.
My BatchSize=128, but ValidationSize=1 works. I’m not sure it can give an error someday… Anyway, I’m happy right now.
And where can I find all of the available options? I can not find it in the Users Guide. How do you know that…?

physmlee · January 28, 2020, 11:51am

Wow, I didn’t know that you are the pymva contributor…
It’s my honor to get your reply…
Thank you…
wow…

moneta · January 28, 2020, 1:56pm

Hi,
I am glad it works, however normally, if you are using the new MethodDL implementation or PyMVA, you should get a FATAL error when the validation size is less that the batch size.
I will probably relax then this condition and make the training working also in that case.

It is true not all options are documented in the Users Guide. We will try to improve on this and at list add all possible options in the example tutorial TMVAClassification.C
For a full list you can always check the code in the DeclareOptions() function of the method class, see for example
https://root.cern.ch/doc/master/MethodDL_8cxx_source.html#l00161

Lorenzo

physmlee · January 28, 2020, 2:35pm

Thank you!
That link really helps me!