1. Machine Learning basics

Those already familiar with the fundamentals of Machine Learning may prefer to skip this chapter.

This chapter attempts to introduce the minimum possible knowledge of Machine Learning (a field within Computer Science) such that any reader with a baseline level of mathematical fluency can understand the general ideas behind how an LLM is created and functions.

This chapter is in no way a fully comprehensive introduction to Machine Learning; there are many quirks and important processes that have been intentionally omitted, specifically:

pre-processing of training data
the myriad possible ways of customising a feed-forward neural network (e.g. partial connections, choices of activation functions)
the specifics of how a weight within a feed-forward neural network is adjusted (gradient descent)
the different methods of training a model (supervised, unsupervised)
how a trained model is measured for efficacy (test data, F1 scores)
handling hyperparameters of a model

However, it presents the most popular general approaches, in the hopes that the reader will be able to then understand ideas that build upon them.

A more formal and comprehensive look at Machine Learning is available via Harvard’s Undergraduate Fundamentals of Machine Learning, an open-source textbook.

Notes on the University of Manchester COMP24112 course are also available at: https://cs-notes.xza.fr