Understand the transformer architecture used for natural language processing (NLP)

The latest breakthrough in Natural Language Processing (NLP) is owed to the development of the Transformer architecture.

Transformers were introduced in the Attention is all you need paper by Vaswani, et al. from 2017. The Transformer architecture provides an alternative to the Recurrent Neural Networks (RNNS) to do NLP. Whereas RNNs are compute-intensive since they process words sequentially, Transformers don’t process the words sequentially, but instead process each word independently in parallel by using attention.

The position of a word and the order of words in a sentence are important to understand the meaning of a text. To include this information, without having to process text sequentially, Transformers use positional encoding.

Understand positional encoding

Before Transformers, language models used word embeddings to encode text into vectors. In the Transformer architecture, positional encoding is used to encode text into vectors. Positional encoding is the sum of word embedding vectors and positional vectors. By doing so, the encoded text includes information about the meaning and position of a word in a sentence.

index

Understand the transformer architecture used for natural language processing (NLP)

Understand positional encoding

Comments

Leave a Reply Cancel reply

More posts

Why Your Sales Team Resists CRM and How to Change That

Why Your Remote Teams Need a Centralized HR Portal

Why Your Procurement Process Needs to Be More Than Just Emails

Why Your Inventory Levels are Likely Leaking Cash