Adversarial attacks

Techniques to fool Machine Learning models by supplying deceptive input, to cause a malfunction

A good attack sample must:

  1. Make the model malfunction
  2. Look plausible or normal to a human, and could have been a sample from the dataset

Generally the higher the odds of the attack sample to cause malfunction, the less plausible it will look.

Impact of noise on classifiers

Misconceptions

The whole space of possible inputs was densely filled with training examples during training

Regions are contiguous and filled with samples

The decision boundaries between classes are smooth and make perfect sense

As such, when randomly noising an original sample to make an Adversarial sample, we move randomly in the feature map - possibly into the boundary region where the sample might become misclassified.