Skip connections
Direct connections between non-adjacent layers in Neural Networks to bypass one or more intermediate layers
This allows information to flow rapidly from inputs to outputs without being transformed or lost between layers
- Enables network to retain information from input layer instead of being transformed multiple times and losing original information
- Facilitates information flow across many layers, mitigating degradation or Vanishing gradients that appear in deep models with many layers
When training the parameters on the first layers of a very deep model, the gradients for the parameters of the first layer are close to zero.
Reason: using the chain rule many times in a row, multiplying partial derivatives with small values eventually leads to a very small, close to zero value for those partial derivatives.
Residual Block
class ResidualBlock(nn.Module):
def __init__(self, n):
super(ResidualBlock, self).__init__()
# Conv Layers
self.conv1 = nn.Conv2d(n, n, 1)
self.conv2 = nn.Conv2d(n, n, 3, 1, 1)
# Final Linear Layer
self.classifier = nn.Linear(n*24*24, 751)
def forward(self, x):
# First Conv block (Conv2d + ReLU), no residual.
out = F.relu(self.conv1(x))
# Second Conv block, add input x as residual.
out = F.relu(self.conv2(out)) + x
# Resize
out = out.view(out.size(0), -1)
# Final Layer
out = self.classifier(out)
return out