The latest breakthrough in Natural Language Processing (NLP) is owed to the development of the Transformer architecture.
Transformers were introduced in the Attention is all you need paper by Vaswani, et al. from 2017. The Transformer architecture provides an alternative to the Recurrent Neural Networks (RNNS) to do NLP. Whereas RNNs are compute-intensive since they process words sequentially, Transformers don’t process the words sequentially, but instead process each word independently in parallel by using attention.
The position of a word and the order of words in a sentence are important to understand the meaning of a text. To include this information, without having to process text sequentially, Transformers use positional encoding.
Understand positional encoding
Before Transformers, language models used word embeddings to encode text into vectors. In the Transformer architecture, positional encoding is used to encode text into vectors. Positional encoding is the sum of word embedding vectors and positional vectors. By doing so, the encoded text includes information about the meaning and position of a word in a sentence.
Leave a Reply