Learning rate decay
A technique to gradually reduce Learning rate of Gradient Descent over iterations
We start with a relatively high learning rate for the model to converge faster, then reduce it as training progresses.
The idea is to improve the model to a closer solution while preventing overfitting as it prevents the model from making updates to its parameters that are too large.
One example would be to reduce the learning rate by a fixed rate at regular intervals like
A high decay may help prevent overshooting a minimum but may result in early convergence as
Adaptive optimisers already adjust the learning rate dynamically.
Examples
- Step decay
- Exponential decay
- Inverse time decay
- Cosine annealing