Adversarial attacks
Techniques to fool Machine Learning models by supplying deceptive input, to cause a malfunction
A good attack sample must:
- Make the model malfunction
- Look plausible or normal to a human, and could have been a sample from the dataset
Generally the higher the odds of the attack sample to cause malfunction, the less plausible it will look.
Impact of noise on classifiers
Misconceptions
The whole space of possible inputs was densely filled with training examples during training
- The space is mostly noise images and sparsely filled with relevant training examples
- Training samples do not cover all possible relevant images
Regions are contiguous and filled with samples
- Regions will not necessarily be contiguous, boundaries may be erratic
The decision boundaries between classes are smooth and make perfect sense
- Boundaries are often randomly decided and may change randomly on different epochs
- Training samples are often condensed, far away from boundaries. Mathematically, they are called Manifolds
As such, when randomly noising an original sample to make an Adversarial sample, we move randomly in the feature map - possibly into the boundary region where the sample might become misclassified.