Gated Recurrent Units

A simpler alternative to Long Short-Term Memory, which combines their forget and input gates to an Update Gate, and merges cell state and hidden state

Hidden State ht - the only state vector

Gates (sigmoid layers):

Candidate Hidden State - could replace ht1 and become the next memory vector ht

h~t=tanh(Whxt+Uh(rtht1+bh)

where rt=0 means the previous hidden state will be entirely discarded, and 1 means entirely kept

New Hidden State - will replace ht1 by combining proportion of ht1 and ht~

ht=(1zt)ht1+ztht~

where is the element wise multiplication, zt=1 means new hidden state is entirely based on candidate hidden vector ht~, and 0 means entirely based on previous hidden state ht1

Predicted output - defined as simple linear operation with a weight matrix V

yt=Vht