From 751c5002038cfb4dd3e4530d0942569d1579e060 Mon Sep 17 00:00:00 2001 From: Matt Archer Date: Tue, 9 Jul 2024 14:47:29 +0100 Subject: [PATCH 1/9] Additional helper comments and content throughout (cherry picked from commit 0f31115e1a48cf425445d0e789bfdcc794792074) --- exercises/01_penguin_classification.ipynb | 131 ++++++++++++++++++++-- 1 file changed, 119 insertions(+), 12 deletions(-) diff --git a/exercises/01_penguin_classification.ipynb b/exercises/01_penguin_classification.ipynb index cf532cb..b0ee13c 100644 --- a/exercises/01_penguin_classification.ipynb +++ b/exercises/01_penguin_classification.ipynb @@ -311,8 +311,11 @@ " - The ``DataLoader`` object allows us to put our inputs and targets in mini-batches, which makes for more efficient training.\n", " - Note: rather than supplying one input-target pair to the model at a time, we supply \"mini-batches\" of these data at once (typically a small power of 2, like 16 or 32).\n", " - The number of items we supply at once is called the batch size.\n", - " - The ``DataLoader`` can also randomly shuffle the data each epoch (when training).\n", - " - It allows us to load different mini-batches in parallel, which can be very useful for larger datasets and images that can't all fit in memory at once.\n", + " - Q. What number should we choose for the batch size?\n", + " - The ``DataLoader`` can also randomly shuffle the data each epoch (when training). This avoids accidental patterns in the data harming the fitting process. Consider providing lots of the positive class followed by the negative class,\n", + "the network will only learn by saying yes all the time. Therefore need to intersperse positives and negatives.\n", + "\n", + " - The ``DataLoader`` also allows us to load different mini-batches in parallel, which can be very useful for larger datasets and images that can't all fit in memory at once.\n", "\n", "\n", "Note: we are going to use batch normalisation layers in our network, which don't work if the batch size is one. This can happen on the last batch, if we don't choose a batch size that evenly divides the number of items in the data set. To avoid this, we can set the ``drop_last`` argument to ``True``. The last batch, which will be of size ``len(data_set) % batch_size`` gets dropped, and the data are reshuffled. This is only relevant during the training process - validation will use population statistics." @@ -337,23 +340,55 @@ "\n", "Here we will create our neural network in PyTorch, and have a general discussion on clean and messy ways of going about it.\n", "\n", + "  The module `torch.nn` contains different classes that help you build neural network models. All models in PyTorch inherit from the subclass `nn.Module`, which has useful methods like `parameters()`, `__call__()` and others.\n", + "\n", + "  `torch.nn` also has various layers that you can use to build your neural network. For example, we will use `nn.Linear` in our code below, which constructs a fully connected layer. In particular, we will two `nn.Linear` layers as part of our network in the `__init__()` method. `torch.nn.Linear` is a subclass of `torch.nn.Module`. \n", + "\n", + "  What exactly is a \"layer\"? It is essentially a step in the neural network computation. i.e. The `nn.Linear` layer computes the linear transformation of the input vector `$x$`: `$y$ = $W^T x + b$`. Where `W` is the matrix of tunable parameters and `b` is a bias vector.\n", + "\n", + "We can also think of the ReLU activation as a \"layer\". However, there are no tunable parameters associated with the ReLU activation function.\n", + "\n", + "  The `__init__()` method is where we typically define the attributes of a class. In our case, all the \"sub-components\" of our model should be defined here.\n", + "\n", + "  The `forward` method is called when we use the neural network to make a prediction. Another term for \"making a prediction\" is running the forward pass, because information flows forward from the input through the hidden layers to the output. When we compute parameter updates, we run the backward pass by calling the function loss.backward(). During the backward pass, information about parameter changes flows backwards, from the output through the hidden layers to the input.\n", + "\n", + "  The `forward` method is called from the `__call__()` function of `nn.Module`, so that when we run `model(batch)`, the `forward` method is called. \n", "- First, we will create quite an ugly network to highlight how to make a neural network in PyTorch on a very basic level.\n", - "- We will then discuss a trick for making the print-out nicer.\n", + "- We will then utilise `torch.nn.Sequential` as a neater approach.\n", "- Finally, we will discuss how the best approach would be to write a class where various parameters (e.g. number of layers, dropout probabilities, etc.) are passed as arguments." ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "from torch.nn import Module\n", "from torch.nn import BatchNorm1d, Linear, ReLU, Dropout\n", + "from torch import Tensor\n", "\n", "\n", "class FCNet(Module):\n", - " \"\"\"Fully-connected neural network.\"\"\"" + " \"\"\"Fully-connected neural network.\"\"\"\n", + "\n", + " # define __init__ function - model defined here.\n", + " def __init__(self):\n", + " pass\n", + "\n", + " # define forward function which calls network\n", + " def forward(self, batch: Tensor) -> Tensor:\n", + " pass\n", + "\n", + "\n", + "# define a model and print and test (try with torch.rand() function)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Note that in a fully-connected feed-forward network, the number of units in each layer always decreases. The neural network is forced to condense information, step-by-step, until it computes the target output we desire. When solving prediction problems, we will rarely (if ever) have a later layer have more neurons than a previous layer." ] }, { @@ -384,7 +419,9 @@ "\n", "While we talked about stochastic gradient descent in the slides, most people use the so-called [Adam optimiser](https://pytorch.org/docs/stable/generated/torch.optim.Adam.html).\n", "\n", - "You can think of it as a more complex and improved implementation of SGD." + "You can think of it as a more complex and improved implementation of SGD.\n", + "\n", + "Here we will tell the optimiser what parameters to fit in order to minimise the loss. " ] }, { @@ -397,20 +434,59 @@ "from torch.optim import Adam" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Have a go at importing the model weights for a large model like ResNet50" + ] + }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Task 9: Writing basic training and validation loops\n", "\n", - "- Before we jump in and write these loops, we must first choose an activation function to apply to the model's outputs.\n", + "- Before we jump in and write these loops, we must first choose an activation function to apply to the model's outputs. We chose not to include this in the network itself.\n", + " - We need to convert our model outputs into something that can be compared to our targets i.e. [0,0,1]\n", " - Here we are going to use the softmax activation function: see [the PyTorch docs](https://pytorch.org/docs/stable/generated/torch.nn.Softmax.html).\n", - " - For those of you who've studied physics, you may be remininded of the partition function in thermodynamics.\n", - " - This activation function is good for classifcation when the result is one of ``A or B or C``.\n", - " - It's bad if you even want to assign two classification to one images—say a photo of a dog _and_ a cat.\n", + " - For those of you who've studied physics, you may be reminded of the partition function in thermodynamics.\n", + " - This activation function is good for classification when the result is one of ``A or B or C``.\n", + " - It's bad if you even want to assign two classification to a single image—say a photo of a dog _and_ a cat.\n", " - It turns the raw outputs, or logits, into \"psuedo probabilities\", and we take our prediction to be the most probable class.\n", "\n", - "- We will write the training loop together, then you can go ahead and write the (simpler) validation loop." + "- Have a go at writing these loops. Read the comments below for help.\n", + "\n", + "TIPS:\n", + "\n", + "- The model needs to be configured for training and validation.\n", + "- We need to tell the softmax function over what dimension we should sum the probabilities over in order to equal 1. This should be along the column axis. \n", + "- The automatic behaviour of the optimiser is to accumulate gradients during training.\n", + "- Utilise `@no_grad` where possible. \n", + "\n", + "- Extracting metrics: \n", + " - Define a dictionary `metrics = {\"loss\": [], \"accuracy\" : []}`\n", + " - Append the loss `loss.item()` which is a 1x1 tensor. We do not need gradients.\n", + " - Get the accuracy by writing a function `get_batch_accuracy(preds: Tensor, targets: Tensor)`.\n", + " - A decision can be computed as follows: `decision = preds.argmax(dim=1)`\n", + " - We need to supply the metrics as `means` over each epoch.\n", + " - The metrics should be a dictionary containing \"loss\" and \"accuracy\" as keys and lists as values which we append to each iteration. We can then use dictionary comprehension to get epoch statistics. \n", + " ```\n", + " metrics = {\"loss \" : [1.0, 2.0, 3.0], \"accuracy\" : [0.7, 0.8, 0.9]}\n", + " return {k : mean(v) for k, v in metrics.items() }\n", + " ```\n", + " - If the validation performance gets really poor this is a sign that we have possibly overfit. \n", + "\n", + "\n", + "\n", + "NOTE: In PyTorch, `requires_grad=True` is set automatically for the parameters of layers defined using `torch.nn.Module` subclasses. Examine the following example:\n", + "```\n", + "x = ones(10, requires_grad=True)\n", + "y = 2*x.exp()\n", + "print(y)\n", + "```\n", + "- Why use BCELoss?\n", + " - It may seem odd to be using BCELoss for a multi-class classification problem. In this case, BCELoss treats each element of the prediction vector as an independent binary classification problem. For each class, it compares the predicted probability against the target and computes the loss. It might be better to use `CrossEntropyLoss` instead (ground truth does not need to be one-hot encoded). `CrossEntropyLoss` combines softmax and negative log likelihood. \n" ] }, { @@ -448,6 +524,27 @@ "\n", " \"\"\"\n", "\n", + " # setup the model for training. IMPORTANT!\n", + "\n", + " # setup loss and accuracy metrics dictionary\n", + "\n", + " # iterate over the batch, targets in the train_loader\n", + " for batch, targets in train_loader:\n", + " pass\n", + "\n", + " # zero the gradients (otherwise gradients accumulate)\n", + "\n", + " # run forward model and compute proxy probabilities over dimension 1 (columns of tensor).\n", + "\n", + " # compute loss\n", + " # e.g. pred = [0.2, 0.7, 0.1] and target = [0, 1, 0]\n", + "\n", + " # compute gradients\n", + "\n", + " # nudge parameters in direction of steepest descent c\n", + "\n", + " # append metrics\n", + "\n", "\n", "def validate_one_epoch(\n", " model: Module,\n", @@ -470,7 +567,10 @@ " Dict[str, float]\n", " Metrics of interest.\n", "\n", - " \"\"\"" + " \"\"\"\n", + "\n", + " for batch, targets in valid_loader:\n", + " pass" ] }, { @@ -498,7 +598,14 @@ "source": [ "epochs = 3\n", "\n", + "# define train_metrics and valid_metrics lists. \n", + "\n", "for _ in range(epochs):\n", + "\n", + " # append output of train_one_epoch() to train_metrics\n", + "\n", + " # append output of valid_one_epoch() to valid_metrics\n", + "\n", " pass" ] }, From 83c0c84757d8e50198795f875f43748112907136 Mon Sep 17 00:00:00 2001 From: Matt Archer Date: Tue, 9 Jul 2024 14:58:05 +0100 Subject: [PATCH 2/9] Comment on the softmax function (cherry picked from commit 29d8b04bd4a62511a58baa74e0e5cc9fc2e6754c) --- exercises/01_penguin_classification.ipynb | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/exercises/01_penguin_classification.ipynb b/exercises/01_penguin_classification.ipynb index b0ee13c..999d7be 100644 --- a/exercises/01_penguin_classification.ipynb +++ b/exercises/01_penguin_classification.ipynb @@ -449,7 +449,8 @@ "\n", "- Before we jump in and write these loops, we must first choose an activation function to apply to the model's outputs. We chose not to include this in the network itself.\n", " - We need to convert our model outputs into something that can be compared to our targets i.e. [0,0,1]\n", - " - Here we are going to use the softmax activation function: see [the PyTorch docs](https://pytorch.org/docs/stable/generated/torch.nn.Softmax.html).\n", + " - \n", + " - Here we are going to use the softmax activation function: see [the PyTorch docs](https://pytorch.org/docs/stable/generated/torch.nn.Softmax.html). It can be seen as a generalization of both the logits and sigmoid functions to handle multi-class classification tasks\n", " - For those of you who've studied physics, you may be reminded of the partition function in thermodynamics.\n", " - This activation function is good for classification when the result is one of ``A or B or C``.\n", " - It's bad if you even want to assign two classification to a single image—say a photo of a dog _and_ a cat.\n", From d71d1a3fbc8515b892ff4d194ad6b5d55de95b48 Mon Sep 17 00:00:00 2001 From: Matt Archer Date: Tue, 9 Jul 2024 14:58:48 +0100 Subject: [PATCH 3/9] Fix solution comment (cherry picked from commit cdb8e8040129b4c8ab5c3607997b50086fe7cdf1) --- worked-solutions/01_penguin_classification_solutions.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/worked-solutions/01_penguin_classification_solutions.ipynb b/worked-solutions/01_penguin_classification_solutions.ipynb index 25b6f49..04c51d2 100644 --- a/worked-solutions/01_penguin_classification_solutions.ipynb +++ b/worked-solutions/01_penguin_classification_solutions.ipynb @@ -810,7 +810,7 @@ " and to instead use the stats it has built up from the training set.\n", " The model should not \"remember\" anything from the validation set.\n", " - We also protect this function with ``torch.no_grad()``, because having\n", - " gradients enable while validating is a pointless waste of\n", + " gradients enabled while validating is a pointless waste of\n", " resources — they are only needed for training.\n", "\n", " \"\"\"\n", From a309ac008bf3a805fca906ae8cc9418d5b65e2d0 Mon Sep 17 00:00:00 2001 From: Matt Archer Date: Tue, 9 Jul 2024 15:11:07 +0100 Subject: [PATCH 4/9] More content --- exercises/01_penguin_classification.ipynb | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/exercises/01_penguin_classification.ipynb b/exercises/01_penguin_classification.ipynb index 999d7be..7a59bc8 100644 --- a/exercises/01_penguin_classification.ipynb +++ b/exercises/01_penguin_classification.ipynb @@ -448,8 +448,7 @@ "### Task 9: Writing basic training and validation loops\n", "\n", "- Before we jump in and write these loops, we must first choose an activation function to apply to the model's outputs. We chose not to include this in the network itself.\n", - " - We need to convert our model outputs into something that can be compared to our targets i.e. [0,0,1]\n", - " - \n", + " - We need to convert our model outputs into something that can be compared to our targets i.e. `[0,0,1]`.\n", " - Here we are going to use the softmax activation function: see [the PyTorch docs](https://pytorch.org/docs/stable/generated/torch.nn.Softmax.html). It can be seen as a generalization of both the logits and sigmoid functions to handle multi-class classification tasks\n", " - For those of you who've studied physics, you may be reminded of the partition function in thermodynamics.\n", " - This activation function is good for classification when the result is one of ``A or B or C``.\n", @@ -463,7 +462,6 @@ "- The model needs to be configured for training and validation.\n", "- We need to tell the softmax function over what dimension we should sum the probabilities over in order to equal 1. This should be along the column axis. \n", "- The automatic behaviour of the optimiser is to accumulate gradients during training.\n", - "- Utilise `@no_grad` where possible. \n", "\n", "- Extracting metrics: \n", " - Define a dictionary `metrics = {\"loss\": [], \"accuracy\" : []}`\n", @@ -478,6 +476,7 @@ " ```\n", " - If the validation performance gets really poor this is a sign that we have possibly overfit. \n", "\n", + "- Utilise `@no_grad` where possible. It temporarily disables gradient calculation, which is beneficial during evaluation phases when gradient updates are not required. \n", "\n", "\n", "NOTE: In PyTorch, `requires_grad=True` is set automatically for the parameters of layers defined using `torch.nn.Module` subclasses. Examine the following example:\n", From f9e53ab7cc74e4ff1afff7bc5641c517328cd000 Mon Sep 17 00:00:00 2001 From: Matt Archer Date: Tue, 9 Jul 2024 15:12:04 +0100 Subject: [PATCH 5/9] Fix formatting --- exercises/01_penguin_classification.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/exercises/01_penguin_classification.ipynb b/exercises/01_penguin_classification.ipynb index 7a59bc8..2d740fd 100644 --- a/exercises/01_penguin_classification.ipynb +++ b/exercises/01_penguin_classification.ipynb @@ -448,7 +448,7 @@ "### Task 9: Writing basic training and validation loops\n", "\n", "- Before we jump in and write these loops, we must first choose an activation function to apply to the model's outputs. We chose not to include this in the network itself.\n", - " - We need to convert our model outputs into something that can be compared to our targets i.e. `[0,0,1]`.\n", + " - We need to convert our model outputs into something that can be compared to our targets i.e. `[0, 0, 1]`.\n", " - Here we are going to use the softmax activation function: see [the PyTorch docs](https://pytorch.org/docs/stable/generated/torch.nn.Softmax.html). It can be seen as a generalization of both the logits and sigmoid functions to handle multi-class classification tasks\n", " - For those of you who've studied physics, you may be reminded of the partition function in thermodynamics.\n", " - This activation function is good for classification when the result is one of ``A or B or C``.\n", From 8216cf19976584755e2c9541917c4d747e201505 Mon Sep 17 00:00:00 2001 From: Matt Archer Date: Tue, 9 Jul 2024 15:16:07 +0100 Subject: [PATCH 6/9] Fix comment on output format --- exercises/01_penguin_classification.ipynb | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/exercises/01_penguin_classification.ipynb b/exercises/01_penguin_classification.ipynb index 2d740fd..ab38d1e 100644 --- a/exercises/01_penguin_classification.ipynb +++ b/exercises/01_penguin_classification.ipynb @@ -447,8 +447,7 @@ "source": [ "### Task 9: Writing basic training and validation loops\n", "\n", - "- Before we jump in and write these loops, we must first choose an activation function to apply to the model's outputs. We chose not to include this in the network itself.\n", - " - We need to convert our model outputs into something that can be compared to our targets i.e. `[0, 0, 1]`.\n", + "- Before we jump in and write these loops, we must first choose an activation function to apply to the model's outputs so that they compared to our targets i.e. `[0, 0, 1]`. We chose not to include this in the network itself.\n", " - Here we are going to use the softmax activation function: see [the PyTorch docs](https://pytorch.org/docs/stable/generated/torch.nn.Softmax.html). It can be seen as a generalization of both the logits and sigmoid functions to handle multi-class classification tasks\n", " - For those of you who've studied physics, you may be reminded of the partition function in thermodynamics.\n", " - This activation function is good for classification when the result is one of ``A or B or C``.\n", From b0f5416fa89b2a7714a4e70ddbf4fe0ea5a801a4 Mon Sep 17 00:00:00 2001 From: Matt Archer Date: Wed, 10 Jul 2024 13:49:43 +0100 Subject: [PATCH 7/9] Forward modification --- exercises/01_penguin_classification.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/exercises/01_penguin_classification.ipynb b/exercises/01_penguin_classification.ipynb index ab38d1e..8a6eb99 100644 --- a/exercises/01_penguin_classification.ipynb +++ b/exercises/01_penguin_classification.ipynb @@ -350,7 +350,7 @@ "\n", "  The `__init__()` method is where we typically define the attributes of a class. In our case, all the \"sub-components\" of our model should be defined here.\n", "\n", - "  The `forward` method is called when we use the neural network to make a prediction. Another term for \"making a prediction\" is running the forward pass, because information flows forward from the input through the hidden layers to the output. When we compute parameter updates, we run the backward pass by calling the function loss.backward(). During the backward pass, information about parameter changes flows backwards, from the output through the hidden layers to the input.\n", + "  The `forward` method is called when we use the neural network to make a prediction. Another term for \"making a prediction\" is running the forward pass, because information flows forward from the input through the hidden layers to the output. When we compute parameter updates, we run the backward pass by calling the function `loss.backward()`. During the backward pass, information about parameter changes flows backwards, from the output through the hidden layers to the input.\n", "\n", "  The `forward` method is called from the `__call__()` function of `nn.Module`, so that when we run `model(batch)`, the `forward` method is called. \n", "- First, we will create quite an ugly network to highlight how to make a neural network in PyTorch on a very basic level.\n", From f2f098b1636b52d8cfc1062f896d85d7c0bdf1f0 Mon Sep 17 00:00:00 2001 From: Matt Archer Date: Wed, 10 Jul 2024 16:03:27 +0100 Subject: [PATCH 8/9] Updated autograd comment --- exercises/01_penguin_classification.ipynb | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/exercises/01_penguin_classification.ipynb b/exercises/01_penguin_classification.ipynb index 8a6eb99..a80035f 100644 --- a/exercises/01_penguin_classification.ipynb +++ b/exercises/01_penguin_classification.ipynb @@ -350,7 +350,7 @@ "\n", "  The `__init__()` method is where we typically define the attributes of a class. In our case, all the \"sub-components\" of our model should be defined here.\n", "\n", - "  The `forward` method is called when we use the neural network to make a prediction. Another term for \"making a prediction\" is running the forward pass, because information flows forward from the input through the hidden layers to the output. When we compute parameter updates, we run the backward pass by calling the function `loss.backward()`. During the backward pass, information about parameter changes flows backwards, from the output through the hidden layers to the input.\n", + "  The `forward` method is called when we use the neural network to make a prediction. Another term for \"making a prediction\" is running the forward pass, because information flows forward from the input through the hidden layers to the output. This builds a computational graph. To compute parameter updates, we run the backward pass by calling the function `loss.backward()`. During the backward pass, `autograd` traverses this graph to compute the gradients, which are then used to update the model's parameters.\n", "\n", "  The `forward` method is called from the `__call__()` function of `nn.Module`, so that when we run `model(batch)`, the `forward` method is called. \n", "- First, we will create quite an ugly network to highlight how to make a neural network in PyTorch on a very basic level.\n", @@ -360,7 +360,7 @@ }, { "cell_type": "code", - "execution_count": 2, + "execution_count": 14, "metadata": {}, "outputs": [], "source": [ From dbb93c3b55165861dc5b801fe3b215dc950a2967 Mon Sep 17 00:00:00 2001 From: Matt Archer Date: Wed, 10 Jul 2024 19:36:03 +0100 Subject: [PATCH 9/9] Small changes --- exercises/01_penguin_classification.ipynb | 9 +-------- 1 file changed, 1 insertion(+), 8 deletions(-) diff --git a/exercises/01_penguin_classification.ipynb b/exercises/01_penguin_classification.ipynb index a80035f..05adb52 100644 --- a/exercises/01_penguin_classification.ipynb +++ b/exercises/01_penguin_classification.ipynb @@ -342,7 +342,7 @@ "\n", "  The module `torch.nn` contains different classes that help you build neural network models. All models in PyTorch inherit from the subclass `nn.Module`, which has useful methods like `parameters()`, `__call__()` and others.\n", "\n", - "  `torch.nn` also has various layers that you can use to build your neural network. For example, we will use `nn.Linear` in our code below, which constructs a fully connected layer. In particular, we will two `nn.Linear` layers as part of our network in the `__init__()` method. `torch.nn.Linear` is a subclass of `torch.nn.Module`. \n", + "  `torch.nn` also has various layers that you can use to build your neural network. For example, we will use `nn.Linear` in our code below, which constructs a fully connected layer. `torch.nn.Linear` is a subclass of `torch.nn.Module`. \n", "\n", "  What exactly is a \"layer\"? It is essentially a step in the neural network computation. i.e. The `nn.Linear` layer computes the linear transformation of the input vector `$x$`: `$y$ = $W^T x + b$`. Where `W` is the matrix of tunable parameters and `b` is a bias vector.\n", "\n", @@ -384,13 +384,6 @@ "# define a model and print and test (try with torch.rand() function)" ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Note that in a fully-connected feed-forward network, the number of units in each layer always decreases. The neural network is forced to condense information, step-by-step, until it computes the target output we desire. When solving prediction problems, we will rarely (if ever) have a later layer have more neurons than a previous layer." - ] - }, { "cell_type": "markdown", "metadata": {},