Universal Embeddings

Embeddings that are pre-trained on a large corpus that are suitable to be plugged in a variety of downstream task models to automatically improve their performance.

It could be the ultimate form of Transfer learning for language embeddings that everyone can use on any language-related task.

First developed in FastText with the inclusion of character n-grams to address out-of-vocabulary words.

Out-of-Vocabulary words

What should the embedding function be doing if asked to encode a word not contained in the original dictionary $V$ , such as from new words or typos?

A good embedding function should at least partially operate on out-of-vocabulary words.