Exploding gradients
As seen by NaN values during training iterations. The counterpart of Vanishing gradients
This occurs when the gradient descent rule has changes that are greater than the current matrix values, exploding to infinity and eventually NaN.
This occurs during:
- Unlucky initialisation
- when the learning rate
is too large