IfNotYouNobody visitors – Page 9

So we need to compute the gradient of CE Loss respect each CNN class conteggio per \(s\)

Defined the loss, now we’ll have puro compute its gradient respect sicuro the output neurons of the CNN in order to backpropagate it through the net and optimize the defined loss function tuning the net parameters. The loss terms coming from the negative classes are nulla. However, the loss gradient respect those negative classes is not cancelled, since the Softmax of the positive class also depends on the negative classes scores.

The gradient expression will be the same for all \(C\) except for the ground truth class \(C_p\), because the risultato of \(C_p\) (\(s_p\)) is in the nominator.

Ricerca profilo ifnotyounobody

Caffe: SoftmaxWithLoss Layer. Is limited sicuro multi-class classification.
Pytorch: CrossEntropyLoss. Is limited to multi-class classification.
TensorFlow: softmax_cross_entropy. Is limited to multi-class classification.

Per this Facebook work they claim that, despite being counter-intuitive, Categorical Cross-Entropy loss, or Softmax loss worked better than Binary Ciclocross-Entropy loss per their multi-label classification problem.

> Skip this part if you are not interested mediante Facebook or me using Softmax Loss for multi-label classification, which is not norma.

When Softmax loss is used is a multi-label ambiente, the gradients get a bit more complex, since the loss contains an element for each positive class. Consider \(M\) are the positive classes of verso sample. The CE Loss with Softmax activations would be:

Where each \(s_p\) sopra \(M\) is the CNN score for each positive class. As sopra Facebook paper, I introduce a scaling factor \(1/M\) onesto make the loss invariant esatto the number of positive classes, which ple.

As Caffe Softmax with Loss layer nor Multinomial Logistic Loss Layer accept multi-label targets, I implemented my own PyCaffe Softmax loss layer, following the specifications of the Facebook paper. Continue reading “So we need to compute the gradient of CE Loss respect each CNN class conteggio per \(s\)”