Gradient-based learning rate control
This refers to adaptive methods that adjusts the learning rates dynamically based on their gradients. They modify the learning rate based on past gradients allowing for adaptive step sizes.
Gradients are used because they provide us the rate of change per iteration. As such, their magnitude indicates the extent of learning.
For instance, this is a gradient-based decay on learning rate:
- With large gradients, we are still making significant progress and should keep the learning rate high to allow the algorithm to continue making adjustments quickly
- With small gradients, we are close to converging and should only make small updates to parameters
However, there are also the inverse approach by making
- With large gradients,
is low to prevent Exploding gradients - With small gradients,
is large to prevent Vanishing gradients