Multi-class classification
Unlike Binary classification, the multi-class classification model has to output
The
Dense layers
- Negative values may be produced (as it is a linear trnasformation)
- These values usually don't sum up to 1 (the constraint is not enforced)
These are achieved with a Softmax function by having the predicted class as the highest value.
since the output vector of values can be used as probabilities for samples of being of class
- & When implementing softmax, it is not used as the final activation function, but is instead applied in the loss function
cross_entropy()
Loss function
Unlike the Log-likelihood cross-entropy function, which only accepts 2 classes, we have to use Multi-class cross entropy function, which is just a variation of the former