2. The Transformer
Diagram 2.0: The Transformer, Vaswani et al. (2017)
This chapter iterates through the layers of The Transformer from the bottom-up. It will start with the lowest layer of the Transformer, featured in Diagram 2.0 above, and gradually explain how each layer works, and then towards the end provide an overview.
Glossary
Large Language Model - a model typically oriented around a highly scaled Transformer
Transformer - a model oriented on a particular set of functions and parameters, which emphasises text processing (both generation and comprehension)
Model - architecture of a trainable model, features functions and trainable parameters; the word is not used to mean a model packaged with trained parameters (this is described as a checkpoint)
Weight - a parameter within a model, where a numerical value can be stored and adjusted during the training period
Training - the initial stage of the lifetime of a model, in which the parameters within a model are assigned numerical values, which are then adjusted as data is run through the model, such that the output of the model becomes closer to a target output
Loss function - a means of measuring the difference between the output of a model and the target output
Backpropagation - deducing how a specific numerical allocation to a parameter affects the numerical output of a loss function
Layer - a part of a Transformer, e.g. the multi attention-head; alternatively, it can mean a set of neurons within a neural network that all implement the same function
SLP - Single Layer Perception, can be represented as a very limited neural network, or simply matrix multiplication
SIMD processor - Single Instruction Multiple Data, meaning a processor that performs the same operation on all the data it holds, such as addition, or multiplication
GPU - a common type of SIMD processor, originally oriented around graphics processing, but equally useful for matrix operations, generally
Tile - a matrix of data, m x n, which is the smallest quantity of data that a GPU can operate on