What is the validation sample when using pyKeras?

a.bragagnolo · May 29, 2018, 9:47am

Dear expert,
my question is simple: the validation sample in pyKeras, used for computing the val_loss (val_accuracy) at each epoch (in order to save only the best model and/or for early stopping), is the test sample defined in the TMVA::dataloader? Or is it a subsample of the training sample?

I don’t understand if this sample is selected by TMVA or Keras and with which criteria.

Thank you,
Alberto

kialbert · May 30, 2018, 8:53am

Hi Alberto,

Yes, the keras interface uses the TMVA test set as the keras validation set.

Do note that the final ROC score output by TMVA at the end of EvaluateAllMethods also use the test data for evaluating performance. This should be fine if your model complexity is low, (small or heavily regularised). To get unbiased estimates of performance you’d have to evaluate on separate data.

Cheers,
Kim

wolfmor · January 15, 2021, 10:23am

Hi,

is it possible that this has changed? I stumbled upon this line in the training output when using TMVA with Keras:

Split TMVA training data in 5865462 training events and 1466365 validation events

So it seems like the TMVA training data is split into training and validation and the TMVA test data is kept as an independent evaluation sample? And if so, does anyone know if it is possible to specify the ratio of training/validation?

Best,
Moritz

a.bragagnolo · January 15, 2021, 1:46pm

Hi Moritz,

it is controlled by the ValidationSize=xx% option in BookMethod.

Best,
Alberto