From 52c3b9aa16392a070ba99048ff089fecb7b90f92 Mon Sep 17 00:00:00 2001 From: Matt Archer Date: Mon, 1 Jul 2024 11:08:15 +0100 Subject: [PATCH 01/10] add two new slides visualising the cost function --- slides/slides.qmd | 20 +++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/slides/slides.qmd b/slides/slides.qmd index 5e73326..486b555 100644 --- a/slides/slides.qmd +++ b/slides/slides.qmd @@ -249,6 +249,24 @@ $$c_{n + 1} = c_{n} - \frac{dL}{dc} \cdot l_{r}$$ ::: +## Cost function #1 + +![](https://miro.medium.com/v2/resize:fit:4800/format:webp/0*fcNdB994NRWt_XZ2.gif){} + +::: {.attribution} +Image source: [Coursera](https://www.coursera.org/specializations/machine-learning-introduction/?utm_medium=coursera&utm_source=home-page&utm_campaign=mlslaunch2022IN) +::: + + +## Cost function #2 + +![](https://miro.medium.com/v2/resize:fit:4800/format:webp/1*8Lp1VXMApbAJlfXy2zq9MA.gif){fig-align="center"} + +::: {.attribution} +Image source: [Coursera](https://www.coursera.org/specializations/machine-learning-introduction/?utm_medium=coursera&utm_source=home-page&utm_campaign=mlslaunch2022IN) +::: + + ## Quick recap {.smaller} To fit a model we need: @@ -285,7 +303,7 @@ $$a_{l+1} = \sigma \left( W_{l}a_{l} + b_{l} \right)$$ ::: :::: -![](https://3b1b-posts.us-east-1.linodeobjects.com//images/topics/neural-networks.jpg){style="border-radius: 50%;" .absolute top=35% left=42.5% width=65%} +![](https://web.archive.org/web/20240102183723/https://3b1b-posts.us-east-1.linodeobjects.com//images/topics/neural-networks.jpg){style="border-radius: 50%;" .absolute top=35% left=42.5% width=65%} ::: {.attribution} Image source: [3Blue1Brown](https://www.3blue1brown.com/topics/neural-networks) From c3e1d11df3bb2990f998030a91d632974cd064c3 Mon Sep 17 00:00:00 2001 From: Matt Archer Date: Mon, 1 Jul 2024 13:48:11 +0100 Subject: [PATCH 02/10] fix figure link --- slides/slides.qmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/slides/slides.qmd b/slides/slides.qmd index 486b555..576850e 100644 --- a/slides/slides.qmd +++ b/slides/slides.qmd @@ -303,7 +303,7 @@ $$a_{l+1} = \sigma \left( W_{l}a_{l} + b_{l} \right)$$ ::: :::: -![](https://web.archive.org/web/20240102183723/https://3b1b-posts.us-east-1.linodeobjects.com//images/topics/neural-networks.jpg){style="border-radius: 50%;" .absolute top=35% left=42.5% width=65%} +![](https://web.archive.org/web/20240102183723if_/https://3b1b-posts.us-east-1.linodeobjects.com/images/topics/neural-networks.jpg){style="border-radius: 50%;" .absolute top=35% left=42.5% width=65%} ::: {.attribution} Image source: [3Blue1Brown](https://www.3blue1brown.com/topics/neural-networks) From 9d85111e69f4ec2ec56c5c2a023c472b9a482abd Mon Sep 17 00:00:00 2001 From: Matt Archer Date: Mon, 1 Jul 2024 14:32:41 +0100 Subject: [PATCH 03/10] a slide on additional resources --- slides/slides.qmd | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/slides/slides.qmd b/slides/slides.qmd index 5e73326..4cc40cd 100644 --- a/slides/slides.qmd +++ b/slides/slides.qmd @@ -313,6 +313,12 @@ Image source: [3Blue1Brown](https://www.3blue1brown.com/topics/neural-networks) - See the PyTorch website: [https://pytorch.org/](https://pytorch.org/) +# Resources + +- [coursera.org/machine-learning-introduction](https://www.coursera.org/specializations/machine-learning-introduction/?utm_medium=coursera&utm_source=home-page&utm_campaign=mlslaunch2022IN) +- [uvadlc](https://uvadlc-notebooks.readthedocs.io/en/latest/) +- [3Blue1Brown](https://www.3blue1brown.com/topics/neural-networks) + # Exercises From aa9ac1106e77b37f19671a22d9c976f9647444b6 Mon Sep 17 00:00:00 2001 From: Matt Archer Date: Tue, 2 Jul 2024 13:35:04 +0100 Subject: [PATCH 04/10] rework slides so that we do not start with optimiser (SGD) straight away --- slides/slides.qmd | 44 +++++++++++++++++++++++++++++--------------- 1 file changed, 29 insertions(+), 15 deletions(-) diff --git a/slides/slides.qmd b/slides/slides.qmd index 5e73326..0c9e714 100644 --- a/slides/slides.qmd +++ b/slides/slides.qmd @@ -113,14 +113,29 @@ Helping Today: # Part 1: Neural-network basics -- and fun applications. +## Fitting a straight line I {.smaller} -## Stochastic gradient descent (SGD) +- Consider the data: + +| $x_{i}$ | $y_{i}$ | +|:--------:|:-------:| +| 1.0 | 2.1 | +| 2.0 | 3.9 | +| 3.0 | 6.2 | + +- Wish to fit a function to the above data. +$$f(x) = mx + c$$ + +- When fitting a function, we are essentially creating a model, $f$, which describes some data, $y$. -- Generally speaking, most neural networks are fit/trained using SGD (or some variant of it). +## Fitting a straight line II - SGD + +- Simple problems like the previous can be solved analytically. +- Generally speaking, most neural networks are fit/trained using Stochastic Gradient Descent (SGD) - or some variant of it. - To understand how one might fit a function with SGD, let's start with a straight line: $$y=mx+c$$ -## Fitting a straight line with SGD I {.smaller} +## Fitting a straight line III - SGD {.smaller} - **Question**---when we a differentiate a function, what do we get? @@ -137,7 +152,7 @@ $$\frac{dy}{dx} = m$$ ::: -## Fitting a straight line with SGD II {.smaller} +## Fitting a straight line IV - SGD {.smaller} - **Answer**---a function's derivative gives a _vector_ which points in the direction of _steepest ascent_. @@ -164,10 +179,9 @@ $$-\frac{dy}{dx}$$ ::: -## Fitting a straight line with SGD III {.smaller} +## Fitting a straight line V - Cost fn {.smaller} -- When fitting a function, we are essentially creating a model, $f$, which describes some data, $y$. -- We therefore need a way of measuring how well a model's predictions match our observations. +- We need a way of measuring how well a model's predictions match our observations. ::: {.fragment .fade-in} @@ -201,7 +215,7 @@ $$L_{\text{MSE}} = \frac{1}{n}\sum_{i=1}^{n}\left(y_{i} - f(x_{i})\right)^{2}$$ ::: -## Fitting a straight line with SGD IV {.smaller} +## Fitting a straight line VI {.smaller} :::: {.columns} ::: {.column width="45%"} @@ -210,18 +224,18 @@ $$L_{\text{MSE}} = \frac{1}{n}\sum_{i=1}^{n}\left(y_{i} - f(x_{i})\right)^{2}$$ - Data: \ $\{x_{i}, y_{i}\}$ -- Loss: \ $\frac{1}{n}\sum_{i=1}^{n}(y_{i} - x_{i})^{2}$ - -::: -::: {.column width="55%"} - -$$ +- Loss fn: +- $$ \begin{align} L_{\text{MSE}} &= \frac{1}{n}\sum_{i=1}^{n}(y_{i} - f(x_{i}))^{2}\\ &= \frac{1}{n}\sum_{i=1}^{n}(y_{i} - mx_{i} + c)^{2} \end{align} $$ + +::: +::: {.column width="55%"} +![](https://images.squarespace-cdn.com/content/v1/5acbdd3a25bf024c12f4c8b4/1600368657769-5BJU5FK86VZ6UXZGRC1M/Mean+Squared+Error.png?format=2500w){width=65%} ::: :::: @@ -233,7 +247,7 @@ $$ :::: {#placeholder} :::: -$$m_{n + 1} = m_{n} - \frac{dL}{dm} \cdot l_{r}$$ +$$m_{t + 1} = m_{t} - \frac{dL}{dm} \cdot l_{r}$$ :::: {#placeholder} :::: From 2fc010d7c0baddc98ab3b822dfaded0af39c9419 Mon Sep 17 00:00:00 2001 From: Matt Archer Date: Tue, 2 Jul 2024 13:48:50 +0100 Subject: [PATCH 05/10] a slide on learning objectives --- slides/slides.qmd | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/slides/slides.qmd b/slides/slides.qmd index 5e73326..6028363 100644 --- a/slides/slides.qmd +++ b/slides/slides.qmd @@ -74,6 +74,14 @@ Based on the workshop developed by [Jack Atkinson](https://orcid.org/0000-0001-5 V1.0 released and JOSE paper accepted: - [@atkinson2024practical] + +## Learning objectives +- provide an understanding of the structure of a PyTorch model and ML pipeline, +- introduce the different functionalities PyTorch might provide, +- encourage good research software engineering (RSE) practice, and +- exercise careful consideration and understanding of data used for training ML models. + + +- treatment of tabular data From ac6fbe6ca4eb1a25c1d75271723ca0c319168acc Mon Sep 17 00:00:00 2001 From: Matt Archer Date: Tue, 2 Jul 2024 17:20:21 +0100 Subject: [PATCH 08/10] add more to learning objectives --- slides/slides.qmd | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/slides/slides.qmd b/slides/slides.qmd index a713290..28d45bb 100644 --- a/slides/slides.qmd +++ b/slides/slides.qmd @@ -76,8 +76,10 @@ V1.0 released and JOSE paper accepted: - [@atkinson2024practical] ## Learning objectives {.smaller} +The key learning objective from this workshop could be simply summarised as: +*Provide the ability to develop ML models in PyTorch.* -We hope to demonstrate sound engineering principles: +Specifically: - provide an understanding of the structure of a PyTorch model and ML pipeline, - introduce the different functionalities PyTorch might provide, From b934549c473f988c005782b2c02514edff9d71eb Mon Sep 17 00:00:00 2001 From: Matt Archer Date: Thu, 4 Jul 2024 09:46:48 +0100 Subject: [PATCH 09/10] fix y=mx + c substitution --- slides/slides.qmd | 153 +++++++++++++++++++++++----------------------- 1 file changed, 78 insertions(+), 75 deletions(-) diff --git a/slides/slides.qmd b/slides/slides.qmd index 2947452..a547f6b 100644 --- a/slides/slides.qmd +++ b/slides/slides.qmd @@ -248,7 +248,7 @@ $$L_{\text{MSE}} = \frac{1}{n}\sum_{i=1}^{n}\left(y_{i} - f(x_{i})\right)^{2}$$ - $$ \begin{align} L_{\text{MSE}} &= \frac{1}{n}\sum_{i=1}^{n}(y_{i} - f(x_{i}))^{2}\\ - &= \frac{1}{n}\sum_{i=1}^{n}(y_{i} - mx_{i} + c)^{2} + &= \frac{1}{n}\sum_{i=1}^{n}(y_{i} - mx_{i} - c)^{2} \end{align} $$ @@ -365,7 +365,7 @@ Image source: [3Blue1Brown](https://www.3blue1brown.com/topics/neural-networks) - See the PyTorch website: [https://pytorch.org/](https://pytorch.org/) -# Resources +# Other resources - [coursera.org/machine-learning-introduction](https://www.coursera.org/specializations/machine-learning-introduction/?utm_medium=coursera&utm_source=home-page&utm_campaign=mlslaunch2022IN) - [uvadlc](https://uvadlc-notebooks.readthedocs.io/en/latest/) @@ -395,124 +395,127 @@ Image source: [Palmer Penguins by Alison Horst](https://allisonhorst.github.io/p - [https://github.com/allisonhorst/palmerpenguins](https://github.com/allisonhorst/palmerpenguins) -# Part 2: Fun with CNNs -## Convolutional neural networks (CNNs): why? {.smaller} -Advantages over simple ANNs: + -- They require far fewer parameters per layer. - - The forward pass of a conv layer involves running a filter of fixed size over the inputs. - - The number of parameters per layer _does not_ depend on the input size. -- They are a much more natural choice of function for *image-like* data: -:::: {.columns} -::: {.column width=10%} -::: -::: {.column width=35%} + -![](https://machinelearningmastery.com/wp-content/uploads/2019/03/Plot-of-the-First-Nine-Photos-of-Dogs-in-the-Dogs-vs-Cats-Dataset.png) + -::: -::: {.column width=10%} -::: -::: {.column width=35%} + + + + -![](https://machinelearningmastery.com/wp-content/uploads/2019/03/Plot-of-the-First-Nine-Photos-of-Cats-in-the-Dogs-vs-Cats-Dataset.png) + + + + -::: -:::: + -::: {.attribution} -Image source: [Machine Learning Mastery](https://machinelearningmastery.com/how-to-develop-a-convolutional-neural-network-to-classify-photos-of-dogs-and-cats/) -::: + + + + + -## Convolutional neural networks (CNNs): why? {.smaller} + + -Some other points: + + + -- Convolutional layers are translationally invariant: - - i.e. they don't care _where_ the "dog" is in the image. -- Convolutional layers are _not_ rotationally invariant. - - e.g. a model trained to detect correctly-oriented human faces will likely fail on upside-down images - - We can address this with data augmentation (explored in exercises). + -## What is a (1D) convolutional layer? {.smaller} + -![](1d-conv.png) + + + + + -See the [`torch.nn.Conv1d` docs](https://pytorch.org/docs/stable/generated/torch.nn.Conv1d.html) + -## 2D convolutional layer {.smaller} + -- Same idea as in on dimension, but in two (funnily enough). + -![](2d-conv.png) -- Everthing else proceeds in the same way as with the 1D case. -- See the [`torch.nn.Conv2d` docs](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html). -- As with Linear layers, Conv2d layers also have non-linear activations applied to them. + + -## Typical CNN overview {.smaller} + -::: {layout="[ 0.5, 0.5 ]"} + + + -![](https://miro.medium.com/v2/resize:fit:1162/format:webp/1*tvwYybdIwvoOs0DuUEJJTg.png) -- Series of conv layers extract features from the inputs. - - Often called an encoder. -- Adaptive pooling layer: - - Image-like objects $\to$ vectors. - - Standardises size. - - [``torch.nn.AdaptiveAvgPool2d``](https://pytorch.org/docs/stable/generated/torch.nn.AdaptiveAvgPool2d.html) - - [``torch.nn.AdaptiveMaxPool2d``](https://pytorch.org/docs/stable/generated/torch.nn.AdaptiveMaxPool2d.html) -- Classification (or regression) head. + -::: + -- For common CNN architectures see [``torchvision.models`` docs](https://pytorch.org/vision/stable/models.html). + -::: {.attribution} -Image source: [medium.com - binary image classifier cnn using tensorflow](https://medium.com/techiepedia/binary-image-classifier-cnn-using-tensorflow-a3f5d6746697) -::: + + + + + + + + + -# Exercises + -## Exercise 1 -- classification + + + -### MNIST hand-written digits. -::: {layout="[ 0.5, 0.5 ]"} + -![](https://i.ytimg.com/vi/0QI3xgXuB-Q/hqdefault.jpg) + -- In this exercise we'll train a CNN to classify hand-written digits in the MNIST dataset. -- See the [MNIST database wiki](https://en.wikipedia.org/wiki/MNIST_database) for more details. + -::: + -::: {.attribution} -Image source: [npmjs.com](https://www.npmjs.com/package/mnist) -::: + + + + -## Exercise 2---regression -### Random ellipse problem + + + -- In this exercise, we'll train a CNN to estimate the centre $(x_{\text{c}}, y_{\text{c}})$ and the $x$ and $y$ radii of an ellipse defined by -$$ -\frac{(x - x_{\text{c}})^{2}}{r_{x}^{2}} + \frac{(y - y_{\text{c}})^{2}}{r_{y}^{2}} = 1 -$$ -- The ellipse, and its background, will have random colours chosen uniformly on $\left[0,\ 255\right]^{3}$. -- In short, the model must learn to estimate $x_{\text{c}}$, $y_{\text{c}}$, $r_{x}$ and $r_{y}$. + + + + + + + + + + + From 9bb87195de1dd91621bca1fec3763944d45c3c4e Mon Sep 17 00:00:00 2001 From: Matt Archer Date: Wed, 10 Jul 2024 21:10:33 +0100 Subject: [PATCH 10/10] Small changes --- slides/slides.qmd | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/slides/slides.qmd b/slides/slides.qmd index a547f6b..8b47ee3 100644 --- a/slides/slides.qmd +++ b/slides/slides.qmd @@ -42,18 +42,14 @@ revealjs-plugins: * 10:30-11:00 - Coffee * 11:00-12:00 - Teaching/Code-along -Lunch +Lunch @ Churchill college * 12:00 - 13:30 -::: {style="color: turquoise;"} -Helping Today: - -* Person 1 - Cambridge RSE -::: ::: :::: + ## Material {.smaller} These slides can be viewed at: