Mean Square Error

The MSE loss function is used for linear regression models.

Derivation

Calculate the square difference between:

Prediction $p_{i} (x_{i}, a, b) = a x_{i} + b$ formed by the model for some given inputs $x_{i}$
True value $y_{i}$

We then repeat this operation for every sample $i$ and average those errors together:

L (a, b, x, y) = \frac{1}{N} \sum_{i = 1}^{N} (a x_{i} + b - y_{i})^{2}

We thus solve for the optimisation problem:

(a^{*}, b^{*}) = \underset{a, b}{argmin} [\frac{1}{N} \sum_{i = 1}^{N} (a x_{i} + b - y_{i})^{2}]

The derivates with respect to $a$ and $b$ :

D_{a} = \frac{\partial L}{\partial a} = \frac{- 2}{N} \sum_{i = 1}^{N} x_{i} (y_{i} - (a x_{i} + b))

D_{b} = \frac{\partial L}{\partial b} = \frac{- 2}{N} \sum_{i}^{N} (y_{i} - (a x_{i} + b))

These are the update rules:

a \leftarrow a - α D_{a}

b \leftarrow b - α D_{b}

With $α$ as the learning rate for gradient descent, the update rules are defined as:

a \leftarrow a + \frac{2 α}{N} \sum_{i = 1}^{N} x_{i} (y_{i} - (a x_{i} + b))

b \leftarrow b + \frac{2 α}{N} \sum_{i}^{N} (y_{i} - (a x_{i} + b))

def loss_mse(a, b, x, y):
	val = np.sum((y - (a*x + b))**2 / x.shape[0]
	return '{:.2e}'.format(val)