Backpropagation
An algorithm used to train Neural Networks
This is a type of Gradient Descent that allows the network to learn by adjusting the weights of the connections.
It will tell our how to adjust weights and bias matrices.
Solving this optimisation problem with GD:
Denoting the error term:
At the same time, we can compute the differential of the loss function with respect to the weights by the chain rule:
Hence, the gradient descent update rule for
where
RECALL:
- The gradient update rule is
- As such,
- Without the learning rate, we would be taking a full step in the direction of the negative gradient, which is only an approximation of the slope of the loss function at the current point