Question about Dropout setting in neural network configuration

oleg_dobrovsky · June 30, 2023, 11:38am

ROOT Version: v6-28-02-5
Platform: Ubuntu 22.04.2 LTS
Compiler: (GCC) 12.2.0

Dear experts. I have a small question to ask. I’m using TMVA::kDL through Python. And in the neural network training setting I use dropout regularization. In the TMVA manual it’s written “Dropout is a regularization technique that with a certain probability sets neuron activations to zero. This probability can be set for all layers at once by giving a single floating point value or a value for each hidden layer of the network separated by ’+’ signs. Probabilities should be given by a value in the interval [0, 1].”
I use the code

factory.BookMethod(dataloader, ROOT.TMVA.Types.kDL, "DNN1", "!H:V:ErrorStrategy=CROSSENTROPY:VarTransform=N:WeightInitialization=XAVIER:    Layout=DENSE|256|TANH,DENSE|256|TANH,DENSE|256|TANH,DENSE|256|TANH,LINEAR:TrainingStrategy=LearningRate=1e3,ConvergenceSteps=30,BatchSize=2048,TestRepetitions=1,MaxEpochs=50,Optimizer=ADAM,DropConfig=0.0++0.3+0.3+0.3")

to book neural network method and when the network starts training part of the output I have in the terminal is

DEEP NEURAL NETWORK:   Depth = 5  Input = ( 1, 1, 20 )  Batch size = 2048  Loss function = C
	Layer 0	 DENSE Layer: 	 ( Input =    20 , Width =   256 ) 	Output = (  1 ,  2048 ,   256 ) 	 Activation Function = Tanh
	Layer 1	 DENSE Layer: 	 ( Input =   256 , Width =   256 ) 	Output = (  1 ,  2048 ,   256 ) 	 Activation Function = Tanh	 Dropout prob. = 0.7
	Layer 2	 DENSE Layer: 	 ( Input =   256 , Width =   256 ) 	Output = (  1 ,  2048 ,   256 ) 	 Activation Function = Tanh	 Dropout prob. = 0.7
	Layer 3	 DENSE Layer: 	 ( Input =   256 , Width =   256 ) 	Output = (  1 ,  2048 ,   256 ) 	 Activation Function = Tanh	 Dropout prob. = 0.7
	Layer 4	 DENSE Layer: 	 ( Input =   256 , Width =     1 ) 	Output = (  1 ,  2048 ,     1 ) 	 Activation Function = Identity

Here it states that Dropout probability is 0.7 though in network config I’ve put 0.3 for all the layers except the first one. So the question is how should I interprent it? Is the probability that the radrom neuron will be deactivated during training is 0.3 or 0.7? I’m a bit confused about the output.

mczurylo · June 30, 2023, 2:24pm

Hi @oleg_dobrovsky,

thank you for your question. Maybe @moneta can help here?

Cheers,
Marta

oleg_dobrovsky · July 4, 2023, 4:19pm

Deer @moneta, maybe you can take a look at this question?

moneta · July 4, 2023, 5:12pm

Hi,

Sorry for the late replay. The probability that the neutron will be de-activated is actually 0.3 and not 0.7. See the code around line 40 at ROOT: tmva/tmva/src/DNN/Architectures/Cpu/Dropout.hxx Source File

and in the DenseLayer documentation, Dropout probability is considered the probability that a neutron is active.

If you have some further issues, please let me know

Lorenzo