Vanishing gradients
The counterpart of Exploding gradients
This occurs when the gradient descent rule has changes that are far smaller than the current matrix values
This can occur when:
- The initial parameters are very small values (or zero)
- The learning rate
is too small - Most parameters remain zero as the changes are close to or equal to zero