#note/reference

Vanishing gradients

The counterpart of Exploding gradients

This occurs when the gradient descent rule has changes that are far smaller than the current matrix values

This can occur when:

The initial parameters are very small values (or zero)
The learning rate $a$ is too small
Most parameters remain zero as the changes are close to or equal to zero