2.0 Primer

Internal mechanisms of an LLM

On the topic of LLM architecture, a paper frequently cited is Vaswani et al. (2017). This paper pioneered the concept of a model known as a Transformer, which is currently the state-of-the-art building block of sequence-to-sequence modelling tasks, which all modern LLMs (such as Meta’s Llama-4) are based upon. The original Transformer was developed for translation tasks; the encoder for the source language, the decoder for the target language (it takes previously generated data as input). Nowadays, common LLMs oriented around text generation (e.g. OpenAI’s GPT, Meta’s Llama) feature only the decoder.^[1] However, an encoder remains of value for making deductions from text.

Transformer-full

Diagram 2.0.0: The Transformer, Vaswani et al. (2017)

The Transformer is made up of an encoder (left side of diagram 1) and a decoder (right side of diagram 1).

The encoder takes an input sequence and converts this into a contextual memory. The decoder takes the contextual memory and generates an output sequence, as well as previously generated output.

References

[1] How do Transformers work? - HuggingFace

2.1 Input embeddings