#note/reference

Learning rate

Controls step size taken by algorithm to minimise loss function

Learning rate tradeoff

If the learning rate is too small, the algorithm will take longer to converge, and may be stuck in non-optimum local minima

If the learning rate is too high, the algorithm may converge faster or lead to exploding gradients

Adjusting $α$

Downscaling initial parameter values
Setting a max value (ceiling) for change on a single iteration
Making the learning rate change over time, large at the beginning and progressively smaller