Gradient-based learning rate control

This refers to adaptive methods that adjusts the learning rates dynamically based on their gradients. They modify the learning rate based on past gradients allowing for adaptive step sizes.

Gradients are used because they provide us the rate of change per iteration. As such, their magnitude indicates the extent of learning.

For instance, this is a gradient-based decay on learning rate:

α=1(1+αcf|f(x)|)

However, there are also the inverse approach by making α inversely proportional to the mean gradient value (or mean squared value).