Hyperparameters

Hyperparameters are parameters that have to be manually decided by the users.

Learning rate $α$
Regularisation term $λ$
Momentum coefficient $μ$
Learning rate decay $K$
Number of layers and sizes of neural network
Which initialiser to use
Which activation functions to use

Hyperparameter searching

According to No free lunch theorem there is no closed-form formula to determine the best value for each hyperparameter. So we have to manually search:

2025-03-05_22-53-10_Hyperparameters_Grid and random search.png

Grid search

By dividing the domain of possible values for each hyperparameter in a discrete grid, we try every combination of values and calculate certain performance metrics

Random search

We define a search space as a bounded domain of hyperparameter values and randomly sample points, keeping the best after trying a certain number of times.

Evaluating the searches

Grid search: Performs better on small spaces
Random search: Better on large domains, no guarantee of returning a good configuration

Hyperparameters