Multi-class cross entropy function

Similar to Log-likelihood cross-entropy function, it is the $i$ -th value of the output vector produced by the model for sample $x_{k}$ after Softmax function has been applied to the output $o_{i}$ of the forward method

L (x, y) = - \frac{1}{N} \sum_{k}^{N} \sum_{i = 0}^{M} y_{k}^{i} \ln (p_{i} (x_{k})) .

$y_{k}^{i}$ is the ground truth value for the probability of being of class $i$ for sample $x_{k}$

For example, if the sample $x_{k}$ is of class $y_{k} = 2$ , assuming we have 10 classes $M = 10$ , then we have:

Y_{k} = (y_{k}^{0}, y_{k}^{1}, . . ., y_{k}^{9}) = (0, 0, 1, . . ., 0)

$Y_{k}$ is the one-hot vector for the sample $k$ with class 2 i.e a binary vector where only one element is "hot" (i.e., set to 1), while all other elements are "cold" (i.e., set to 0).