Skip connections

Direct connections between non-adjacent layers in Neural Networks to bypass one or more intermediate layers

This allows information to flow rapidly from inputs to outputs without being transformed or lost between layers

When training the parameters on the first layers of a very deep model, the gradients for the parameters of the first layer are close to zero.

Reason: using the chain rule many times in a row, multiplying partial derivatives with small values eventually leads to a very small, close to zero value for those partial derivatives.

LW1=LA25A25A24...A1W1

Residual Block

class ResidualBlock(nn.Module):
    def __init__(self, n):
        super(ResidualBlock, self).__init__()

        # Conv Layers
        self.conv1 = nn.Conv2d(n, n, 1)
        self.conv2 = nn.Conv2d(n, n, 3, 1, 1)

        # Final Linear Layer
        self.classifier = nn.Linear(n*24*24, 751)

    def forward(self, x):
        # First Conv block (Conv2d + ReLU), no residual.
        out = F.relu(self.conv1(x))

        # Second Conv block, add input x as residual.
        out = F.relu(self.conv2(out)) + x

        # Resize
        out = out.view(out.size(0), -1)

        # Final Layer
        out = self.classifier(out)
        return out