Data augmentation

Technique to artificially increase the size of the training dataset by applying transformations to existing data

By exposing the model to more variations, it ensures robustness and prevents overfitting by learning to recognise patterns and features that are supposed to be invariant to certain changes such as translations, rotations and scaling.

They can also be used to correct imbalance in distribution of classes by artificially increasing the number of examples of the under-represented classes. This helps when the amount of available training data is limited.

Transformation examples

Randomly translating images by a certain number of pixels
Randomly rotating the images by a small angle
Randomly scaling the images by a small factor
Randomly cropping the images
Randomly flipping the images horizontally/vertically
Adding random noise to the images
Randomly changing the brightness or contrast of the images
! Beware - first determine which transformation makes sense for the image dataset