What Are Transformers?

What Are Transformers?

ยท

2 min read

In the last article, we explored NLP tasks and their significance in the growing AI landscape. Now, let's delve into how a machine model performs these tasks.

The answer lies in Transformers.

Transformers are deep learning models that use mathematical techniques to process sequential data, enabling them to learn context and meaning.

Transformers can be applied to any application involving sequential text, image, or video data.

Let's see in code

from transformers import pipeline

generator = pipeline("text-generation")
generator(
    "As a machine it diffcult to understand"
)

Here, transformer is giving a "prompt" . Then, It try to generate text to complete the sentence.

Transformers are language models trained on vast amounts of raw text in a self-supervised environment. Their objective is to develop a statistical understanding of language. However, they are not initially effective for specific tasks. To enhance their performance, they undergo a process called transfer learning, where the model is further trained on human-annotated datasets in a supervised manner.

General Architecture of Transformer

At high level let's say It accept the input, acquire understanding from input and develop a encoded representation. This can further be used by decoder to generate output probability.

A key feature of transformer is "attention layer" which basically suggest to pay specific attention to certain words in the input passed.

Transformer can be encoder only, decoder only or encoder-decoder model

I hope it helps to develop a general idea about Transformers. I will dive deeper into the technical aspects as we move forward.

A thoughtful question to exercise the brain, which type of transformer model are you consuming in your day today work. ๐Ÿ˜Ž

Please like and share. Drop your questions and comments so we can learn together.

ย