Multi-class cross entropy function

Similar to Log-likelihood cross-entropy function, it is the i-th value of the output vector produced by the model for sample xk after Softmax function has been applied to the output oi of the forward method

L(x,y)=1NkNi=0Mykiln(pi(xk)).

yki is the ground truth value for the probability of being of class i for sample xk

For example, if the sample xk is of class yk=2, assuming we have 10 classes M=10, then we have:

Yk=(yk0,yk1,...,yk9)=(0,0,1,...,0)

Yk is the one-hot vector for the sample k with class 2 i.e a binary vector where only one element is "hot" (i.e., set to 1), while all other elements are "cold" (i.e., set to 0).