Highway Layer

Variant of the fully connected Linear layer, with an additional gated residual connection

It produces a standard mapping with a non-linear activation function y=g(W1x+b1)

z=tg(W1x+b1)+(1t)xt=σ(W2x+b2)

Examples:

# Blocks to be used for the highway connection
self.highway = nn.Linear(128, 128)
self.transform = nn.Linear(128, 128)

# Some Highway Layers
h = self.highway(x)
t_gate = torch.sigmoid(self.transform(x))
c_gate = 1 - t_gate
x_ = h * t_gate + x * c_gate