Dropout layer

Regularisation technique to prevent overfitting, by randomly dropping out (setting to zero) some neurons in Neural Networks during each iteration

It is typically inserted:

after a linear or convolutional layer
before/after a non-linear activation function.
& In PyTorch, nn.Dropout takes the argument $p$ , probability of an element dropping out

As dropout layers are not trainable, $p$ is to be manually decided and hence a hyperparameter

2025-03-06_02-28-35_Dropout layers.png

Since different subsets of neurons are trained at different times, dropout can be thought of as training multiple smaller networks that are later combined.

The network learns to distribute information across multiple neurons rather than relying on a few dominant ones.

Differences in training and evaluation mode

train(): Randomly drops out a specified portion of neurons
eval(): Does not drop out any neurons, passing all activations from previous to next layer