Mean Square Error

The MSE loss function is used for linear regression models.

Derivation

Calculate the square difference between:

We then repeat this operation for every sample i and average those errors together:

L(a,b,x,y)=1Ni=1N(axi+byi)2

We thus solve for the optimisation problem:

(a,b)=argmina,b[1Ni=1N(axi+byi)2]

The derivates with respect to a and b:

Da=La=2Ni=1Nxi(yi(axi+b))Db=Lb=2NiN(yi(axi+b))

These are the update rules:

aaαDabbαDb

With α as the learning rate for gradient descent, the update rules are defined as:

aa+2αNi=1Nxi(yi(axi+b))bb+2αNiN(yi(axi+b))
def loss_mse(a, b, x, y):
	val = np.sum((y - (a*x + b))**2 / x.shape[0]
	return '{:.2e}'.format(val)