Linear regression

The model assumes a linear relationship between inputs and outputs and thus consists of two trainable parameters to be freely chosen.

yaxi+b

This is the equation of a line, which is a 1D subspace, and therefore a hyperplane of the 2D ambient space.

We solve this loss function's optimisation problem:

(a,b)=argmina,b[1Ni=1N(axi+byi)2]

Normal equation

W=(XT X)1 XT YX=(1x11xN)Y=(y1yN)

This becomes computationally expensive when number of features in X is large, or impossible when matrix product of feature variables (XT X)1 is not invertible.

This is solved with Gradient Descent.