Learning rate
Controls step size taken by algorithm to minimise loss function
Learning rate tradeoff
If the learning rate is too small, the algorithm will take longer to converge, and may be stuck in non-optimum local minima
If the learning rate is too high, the algorithm may converge faster or lead to exploding gradients
Adjusting
- Downscaling initial parameter values
- Setting a max value (ceiling) for change on a single iteration
- Making the learning rate change over time, large at the beginning and progressively smaller