Exploding gradients

As seen by NaN values during training iterations. The counterpart of Vanishing gradients

This occurs when the gradient descent rule has changes that are greater than the current matrix values, exploding to infinity and eventually NaN.

This occurs during: