Linear regression

The model assumes a linear relationship between inputs and outputs and thus consists of two trainable parameters to be freely chosen.

y \approx a x_{i} + b

This is the equation of a line, which is a 1D subspace, and therefore a hyperplane of the 2D ambient space.

We solve this loss function's optimisation problem:

(a^{*}, b^{*}) = \underset{a, b}{argmin} [\frac{1}{N} \sum_{i = 1}^{N} (a x_{i} + b - y_{i})^{2}]

Normal equation

W^{*} = (X^{T} X)^{- 1} X^{T} Y

X = (\begin{matrix} 1 & x_{1} \\ ⋮ & ⋮ \\ 1 & x_{N} \end{matrix})

Y = (\begin{matrix} y_{1} \\ ⋮ \\ y_{N} \end{matrix})

This becomes computationally expensive when number of features in X is large, or impossible when matrix product of feature variables $(X^{T} X)^{- 1}$ is not invertible.

This is solved with Gradient Descent.