FastText

Each word is represented by a bag of characters n-grams in addition to the word itself

We split each word into n-grams of size $n$ . For instance, $n = 3$ for "computer" would turn into <co, com, omp, mpu, put, ute, ter, er>. This captures sub-word information / etymology.

Input: Concatenation of the word and its n-grams

Model: Skip-gram architecture

Output: $2 k$ words of context