Train and test split

Divides the dataset into the training and test sets.

Typically, the train and test split divides 80% of the data for training and 20% for testing.

It is important that the train and test sets follow similar distributions.

When approximating a mean/average metric, like Mean Square Error, we need enough samples so that the empirical approximation matches closely the theoretical value (based on the Law of large numbers)

Train-test-validation split

Good numbers: 80% in training set, 10% in validation, 10% in test set