Stochastic Gradient Descent
We formulate predictions for a single sample, randomly drawn in the dataset, to compute the value of the Loss function for current model parameters and then perform Backpropagation.
def train(self, inputs, outputs, N_max = 1000, alpha = 1e-5, beta1=0.9, beta2=0.999, delta = 1e-5, displa
# Get number of samples
M = inputs.shape[0]
# List of losses, starts with the current loss
self.losses_list = [self.CE_loss(inputs, outputs)]
# Initialize G_list
G_list = [0*self.W2, 0*self.W1, 0*self.b2, 0*self.b1, \
0*self.W2, 0*self.W1, 0*self.b2, 0*self.b1]
# Repeat iterations
for iteration_number in range(1, N_max + 1):
# Stochastic GD on one randomly chosen sample
indexes = np.random.randint(0, M)
inputs_sub = np.array([inputs[indexes, :]])
outputs_sub = np.array([outputs[indexes, :]])
# Backpropagate
G_list, loss = self.backward(inputs_sub, outputs_sub, G_list, iteration_number, alpha, beta1, beta2)
Since most loss functions are using mean error values over several samples, we are trying to reduce Mean Square Error here.
This method doesn't lead to a good estimation of the Mean Square Error but it is faster to train.
Mini-batch Gradient Descent
Gradient Descent with a subset
randomly drawn samples from the dataset
def train(self, inputs, outputs, N_max = 1000, alpha = 1e-5, beta1 = 0.9, beta2 = 0.999, \
delta = 1e-5, batch_size = 100, display = True):
# Get number of samples
M = inputs.shape[0]
# List of losses, starts with the current loss
self.losses_list = [self.CE_loss(inputs, outputs)]
# Initialize G_list
G_list = [0*self.W2, 0*self.W1, 0*self.b2, 0*self.b1, \
0*self.W2, 0*self.W1, 0*self.b2, 0*self.b1]
# Define RNG for stochastic minibatches
rng = default_rng()
# Repeat iterations
for iteration_number in range(1, N_max + 1):
# Select a subset of inputs and outputs with given batch size
shuffler = rng.choice(M, size = batch_size, replace = False)
inputs_sub = inputs[shuffler, :]
outputs_sub = outputs[shuffler, :]
# Backpropagate
G_list, loss = self.backward(inputs_sub, outputs_sub, G_list, iteration_number, alpha, beta1, beta2)
Batch sizes
Usually better to choose a batch site
Larger batch size means slower computation but better training performance.