Backpropagation

An algorithm used to train Neural Networks

This is a type of Gradient Descent that allows the network to learn by adjusting the weights of the connections.

It will tell our how to adjust weights and bias matrices.

Solving this optimisation problem with GD:

W1,b1,W2,b2=argminW1,b1,W2,b2[1M(Ypred(X,W1,b1,W2,b2)Y)2]=argminW1,b1,W2,b2[1M(W2(W1X+b1)+b2Y)2]

Denoting the error term:

ϵ=Ypred(X,W1,b1,W2,b2)Y=W2(W1X+b1)+b2Y

At the same time, we can compute the differential of the loss function with respect to the weights by the chain rule:

LW2=LYpredYpredW2LYpred=2MYpred(X,W1,b1,W2,b2)Y=2ϵMYpredW2=W1X+b1

Hence, the gradient descent update rule for W2 is:

W2W22ϵαM(W1X+b1)b2b22ϵαM

where α is the learning rate

RECALL:

LW1=LYpredYpredW1=(2ϵM)(W2X)Lb1=LYpredYpredb1=(2ϵM)(W2)W1W12ϵαMW2Xb1b12ϵαMW2