Adversarial attacks

Techniques to fool Machine Learning models by supplying deceptive input, to cause a malfunction

A good attack sample must:

Make the model malfunction
Look plausible or normal to a human, and could have been a sample from the dataset

Generally the higher the odds of the attack sample to cause malfunction, the less plausible it will look.

Impact of noise on classifiers

Misconceptions

The whole space of possible inputs was densely filled with training examples during training

The space is mostly noise images and sparsely filled with relevant training examples
Training samples do not cover all possible relevant images

Regions are contiguous and filled with samples

Regions will not necessarily be contiguous, boundaries may be erratic

The decision boundaries between classes are smooth and make perfect sense

Boundaries are often randomly decided and may change randomly on different epochs
Training samples are often condensed, far away from boundaries. Mathematically, they are called Manifolds

As such, when randomly noising an original sample to make an Adversarial sample, we move randomly in the feature map - possibly into the boundary region where the sample might become misclassified.

How we exploit networks with ENM attacks