1.1 Backpropagation
Before discussing backpropagation (short for backward propagation), forward propagation needs to be understood.
Forward propagation means inputting data into a feed-forward neural network, running it through the functions that make up the neural network based on the current set of values of the weights, and generating output.
Diagram 1.1.1: Example of a basic feed-forward neural network, with only 3 inputs, 1 hidden layer, and 1 output. A vector is input into the input layer (one dimension per neuron), run through the functions that make up the feed-forward neural network, and then a vector is output by the output layer (one dimension per neuron).
An example that could be associated with Diagram 1.0.1, is that the input layer could take, as input, 3 of the Big Five personality trait scores of a person, as a 3-dimensional vector, such as agreeableness, contientiousness, and neuroticism. After forward propagation of the data through the functions that make up the feed-forward neural network, the output layer could output a prediction about their age, as a 1-dimensional vector. Note that, psychological studies have already drawn general conclusions on how these specific traits usually change with age (agreeableness and contientiousness increase with age, neuroticism decreases with age, and, as an aside, openness to new experiences/ideas also decreases).
If the prediction that the feed-forward neural network makes is unexpected, for example if the feed-forward neural network predicts an age of 21 instead of 29, then it is possible to find out how one specific weight within the feed-forward neural network affected the output, once the output has been derived. This process is known as backpropagation.
After selecting a function, , that takes as input the expected output and the predicted output, and outputs a numerical score, to find out how a specific weight affects the value of would be represented as follows:
A partial derivate, unlike a total derivative, depicts how a function changes in respect to one specific variable, and is generally more suitable to functions with multiple variables.
By considering the calculus chain rule, it is possible to expand out.
Diagram 1.1.2: Duplication of Diagram 1.0.2, previously described.
Consider again the example of neuron h1, in diagram 1.1.2, and the feed-forward neural network in diagram 1.1.1. It is wanted to find how the weight in h1 affects the difference, , between the predicted age, , and expected age, :
On the general topic of backpropagation, it is necessary to know how each neuron changes in respect to inputs from the previous layer, and finally how the functions within the neuron containing the target weight change, as the target weight changes. This can be handled algebraically, however, in practice, it is not, as feed-forward neural networks can contain thousands of neurons.
Practice questions
1. Why might it be desirable to know how all the different weights within the neural network affect the difference between the predicted output of the neural network and the expected output?
Answer
2. Consider the function , where , , , and are variables. Is the point at which a turning point (i.e. the gradient of goes from negative to positive, or vice versa)?