1.3 Implementation

Over the years, the technical knowledge depth required to setup and train a feed-forward neural network has rapidly decreased. A library written in Lua which wrapped around a C implementation of handling a data structure known as a tensor (very similar to a matrix), named Torch, began to see active development in 2012. Over the years, corporate funding began to be directed to more common programming languages, and 2016 saw public releases of PyTorch funded by Meta, and TensorFlow funded by Google.

Nowadays, it is possible to define a feed-forward neural network with only a few lines of code, and run the backpropagation algorithm automatically (this is known as automatic differentiation). These automated methods originally saw research in 2015.^[1] Specifically, this is implemented in PyTorch by handling the data run through the feed-forward network in an object known as a tensor, and storing the operations performed on the data within the object.

Practice questions

1. Approximately how much energy is required to train an LLM, such as GPT-3, in terms of quantity of boiled kettles?

Answer

Allegedly GPT-3 training consumed 1,287,000 kWh.^[2] If boiling a kettle requires 0.15kWh, this is equivalent to energy requirement of boiling 8,580,000 kettles (or the same kettle 8,580,000 times, presuming it would survive). These numbers exclude manufacturing costs of the hardware (i.e. GPUs, kettles).

Consider that the largest hydroelectric dam in the world, Three Gorges Dam (三峡大坝), near Wuhan, produces ~95,000,000,000 kWh per year.

2. What is the electricity cost of one single LLM query (for the institution running the LLM)?

Answer

According to Sam Altman, the aveage ChatGPT-5 query uses ~0.34 Wh. Based on energy prices in the UK, this energy use costs approximately £0.000085.

$\frac{0.34}{1000}\cdot 0.25 = 0.000085$

References

[1] Automatic differentiation in machine learning: a susssrvey

[2] The growing energy footprint of artificial intelligence

1.2 Training