Skip to content

Draft2024slides #77

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 13 commits into from
249 changes: 153 additions & 96 deletions slides/slides.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -42,18 +42,14 @@ revealjs-plugins:
* 10:30-11:00 - Coffee
* 11:00-12:00 - Teaching/Code-along

Lunch
Lunch @ Churchill college

* 12:00 - 13:30

::: {style="color: turquoise;"}
Helping Today:

* Person 1 - Cambridge RSE
:::
:::
::::


## Material {.smaller}

These slides can be viewed at:
Expand All @@ -74,6 +70,26 @@ Based on the workshop developed by [Jack Atkinson](https://orcid.org/0000-0001-5
V1.0 released and JOSE paper accepted:

- [@atkinson2024practical]

## Learning objectives {.smaller}
The key learning objective from this workshop could be simply summarised as:
*Provide the ability to develop ML models in PyTorch.*

Specifically:

- provide an understanding of the structure of a PyTorch model and ML pipeline,
- introduce the different functionalities PyTorch might provide,
- encourage good research software engineering (RSE) practice, and
- exercise careful consideration and understanding of data used for training ML models.

\
\
With regards to specific ML content, we cover:

- using ML for both classification and regression,
- artificial neural networks (ANNs) <!-- and convolutional neural networks (CNNs) -->
- treatment of tabular data <!-- and image data -->

<!--
## NCAS School (rough) Schedule {.smaller}

Expand Down Expand Up @@ -113,14 +129,29 @@ Helping Today:

# Part 1: Neural-network basics -- and fun applications.

## Fitting a straight line I {.smaller}

- Consider the data:

| $x_{i}$ | $y_{i}$ |
|:--------:|:-------:|
| 1.0 | 2.1 |
| 2.0 | 3.9 |
| 3.0 | 6.2 |

## Stochastic gradient descent (SGD)
- Wish to fit a function to the above data.
$$f(x) = mx + c$$

- Generally speaking, most neural networks are fit/trained using SGD (or some variant of it).
- When fitting a function, we are essentially creating a model, $f$, which describes some data, $y$.

## Fitting a straight line II - SGD

- Simple problems like the previous can be solved analytically.
- Generally speaking, most neural networks are fit/trained using Stochastic Gradient Descent (SGD) - or some variant of it.
- To understand how one might fit a function with SGD, let's start with a straight line: $$y=mx+c$$


## Fitting a straight line with SGD I {.smaller}
## Fitting a straight line III - SGD {.smaller}

- **Question**---when we a differentiate a function, what do we get?

Expand All @@ -137,7 +168,7 @@ $$\frac{dy}{dx} = m$$
:::


## Fitting a straight line with SGD II {.smaller}
## Fitting a straight line IV - SGD {.smaller}

- **Answer**---a function's derivative gives a _vector_ which points in the direction of _steepest ascent_.

Expand All @@ -164,10 +195,9 @@ $$-\frac{dy}{dx}$$
:::


## Fitting a straight line with SGD III {.smaller}
## Fitting a straight line V - Cost fn {.smaller}

- When fitting a function, we are essentially creating a model, $f$, which describes some data, $y$.
- We therefore need a way of measuring how well a model's predictions match our observations.
- We need a way of measuring how well a model's predictions match our observations.


::: {.fragment .fade-in}
Expand Down Expand Up @@ -201,7 +231,7 @@ $$L_{\text{MSE}} = \frac{1}{n}\sum_{i=1}^{n}\left(y_{i} - f(x_{i})\right)^{2}$$
:::


## Fitting a straight line with SGD IV {.smaller}
## Fitting a straight line VI {.smaller}

:::: {.columns}
::: {.column width="45%"}
Expand All @@ -210,18 +240,18 @@ $$L_{\text{MSE}} = \frac{1}{n}\sum_{i=1}^{n}\left(y_{i} - f(x_{i})\right)^{2}$$

- Data: \ $\{x_{i}, y_{i}\}$

- Loss: \ $\frac{1}{n}\sum_{i=1}^{n}(y_{i} - x_{i})^{2}$

:::
::: {.column width="55%"}

$$
- Loss fn:
- $$
\begin{align}
L_{\text{MSE}} &= \frac{1}{n}\sum_{i=1}^{n}(y_{i} - f(x_{i}))^{2}\\
&= \frac{1}{n}\sum_{i=1}^{n}(y_{i} - mx_{i} + c)^{2}
&= \frac{1}{n}\sum_{i=1}^{n}(y_{i} - mx_{i} - c)^{2}
\end{align}
$$
<!-- - Loss: \$\frac{1}{n}\sum_{i=1}^{n}(y_{i} - x_{i})^{2}$ -->

:::
::: {.column width="55%"}
![](https://images.squarespace-cdn.com/content/v1/5acbdd3a25bf024c12f4c8b4/1600368657769-5BJU5FK86VZ6UXZGRC1M/Mean+Squared+Error.png?format=2500w){width=65%}
:::
::::

Expand All @@ -233,7 +263,7 @@ $$
:::: {#placeholder}
::::

$$m_{n + 1} = m_{n} - \frac{dL}{dm} \cdot l_{r}$$
$$m_{t + 1} = m_{t} - \frac{dL}{dm} \cdot l_{r}$$

:::: {#placeholder}
::::
Expand All @@ -249,6 +279,24 @@ $$c_{n + 1} = c_{n} - \frac{dL}{dc} \cdot l_{r}$$
:::


## Cost function #1

![](https://miro.medium.com/v2/resize:fit:4800/format:webp/0*fcNdB994NRWt_XZ2.gif){}

::: {.attribution}
Image source: [Coursera](https://www.coursera.org/specializations/machine-learning-introduction/?utm_medium=coursera&utm_source=home-page&utm_campaign=mlslaunch2022IN)
:::


## Cost function #2

![](https://miro.medium.com/v2/resize:fit:4800/format:webp/1*8Lp1VXMApbAJlfXy2zq9MA.gif){fig-align="center"}

::: {.attribution}
Image source: [Coursera](https://www.coursera.org/specializations/machine-learning-introduction/?utm_medium=coursera&utm_source=home-page&utm_campaign=mlslaunch2022IN)
:::


## Quick recap {.smaller}

To fit a model we need:
Expand Down Expand Up @@ -285,7 +333,7 @@ $$a_{l+1} = \sigma \left( W_{l}a_{l} + b_{l} \right)$$
:::
::::

![](https://3b1b-posts.us-east-1.linodeobjects.com//images/topics/neural-networks.jpg){style="border-radius: 50%;" .absolute top=35% left=42.5% width=65%}
![](https://web.archive.org/web/20240102183723if_/https://3b1b-posts.us-east-1.linodeobjects.com/images/topics/neural-networks.jpg){style="border-radius: 50%;" .absolute top=35% left=42.5% width=65%}

::: {.attribution}
Image source: [3Blue1Brown](https://www.3blue1brown.com/topics/neural-networks)
Expand Down Expand Up @@ -313,6 +361,12 @@ Image source: [3Blue1Brown](https://www.3blue1brown.com/topics/neural-networks)
- See the PyTorch website: [https://pytorch.org/](https://pytorch.org/)


# Other resources

- [coursera.org/machine-learning-introduction](https://www.coursera.org/specializations/machine-learning-introduction/?utm_medium=coursera&utm_source=home-page&utm_campaign=mlslaunch2022IN)
- [uvadlc](https://uvadlc-notebooks.readthedocs.io/en/latest/)
- [3Blue1Brown](https://www.3blue1brown.com/topics/neural-networks)

# Exercises


Expand All @@ -337,124 +391,127 @@ Image source: [Palmer Penguins by Alison Horst](https://allisonhorst.github.io/p
- [https://github.com/allisonhorst/palmerpenguins](https://github.com/allisonhorst/palmerpenguins)


# Part 2: Fun with CNNs


## Convolutional neural networks (CNNs): why? {.smaller}

Advantages over simple ANNs:
<!-- # Part 2: Fun with CNNs -->

- They require far fewer parameters per layer.
- The forward pass of a conv layer involves running a filter of fixed size over the inputs.
- The number of parameters per layer _does not_ depend on the input size.
- They are a much more natural choice of function for *image-like* data:

:::: {.columns}
::: {.column width=10%}
:::
::: {.column width=35%}
<!-- ## Convolutional neural networks (CNNs): why? {.smaller} -->

![](https://machinelearningmastery.com/wp-content/uploads/2019/03/Plot-of-the-First-Nine-Photos-of-Dogs-in-the-Dogs-vs-Cats-Dataset.png)
<!-- Advantages over simple ANNs: -->

:::
::: {.column width=10%}
:::
::: {.column width=35%}
<!-- - They require far fewer parameters per layer. -->
<!-- - The forward pass of a conv layer involves running a filter of fixed size over the inputs. -->
<!-- - The number of parameters per layer _does not_ depend on the input size. -->
<!-- - They are a much more natural choice of function for *image-like* data: -->

![](https://machinelearningmastery.com/wp-content/uploads/2019/03/Plot-of-the-First-Nine-Photos-of-Cats-in-the-Dogs-vs-Cats-Dataset.png)
<!-- :::: {.columns} -->
<!-- ::: {.column width=10%} -->
<!-- ::: -->
<!-- ::: {.column width=35%} -->

:::
::::
<!-- ![](https://machinelearningmastery.com/wp-content/uploads/2019/03/Plot-of-the-First-Nine-Photos-of-Dogs-in-the-Dogs-vs-Cats-Dataset.png) -->

::: {.attribution}
Image source: [Machine Learning Mastery](https://machinelearningmastery.com/how-to-develop-a-convolutional-neural-network-to-classify-photos-of-dogs-and-cats/)
:::
<!-- ::: -->
<!-- ::: {.column width=10%} -->
<!-- ::: -->
<!-- ::: {.column width=35%} -->

<!-- ![](https://machinelearningmastery.com/wp-content/uploads/2019/03/Plot-of-the-First-Nine-Photos-of-Cats-in-the-Dogs-vs-Cats-Dataset.png) -->

## Convolutional neural networks (CNNs): why? {.smaller}
<!-- ::: -->
<!-- :::: -->

Some other points:
<!-- ::: {.attribution} -->
<!-- Image source: [Machine Learning Mastery](https://machinelearningmastery.com/how-to-develop-a-convolutional-neural-network-to-classify-photos-of-dogs-and-cats/) -->
<!-- ::: -->

- Convolutional layers are translationally invariant:
- i.e. they don't care _where_ the "dog" is in the image.
- Convolutional layers are _not_ rotationally invariant.
- e.g. a model trained to detect correctly-oriented human faces will likely fail on upside-down images
- We can address this with data augmentation (explored in exercises).

<!-- ## Convolutional neural networks (CNNs): why? {.smaller} -->

## What is a (1D) convolutional layer? {.smaller}
<!-- Some other points: -->

![](1d-conv.png)
<!-- - Convolutional layers are translationally invariant: -->
<!-- - i.e. they don't care _where_ the "dog" is in the image. -->
<!-- - Convolutional layers are _not_ rotationally invariant. -->
<!-- - e.g. a model trained to detect correctly-oriented human faces will likely fail on upside-down images -->
<!-- - We can address this with data augmentation (explored in exercises). -->

See the [`torch.nn.Conv1d` docs](https://pytorch.org/docs/stable/generated/torch.nn.Conv1d.html)

<!-- ## What is a (1D) convolutional layer? {.smaller} -->

## 2D convolutional layer {.smaller}
<!-- ![](1d-conv.png) -->

- Same idea as in on dimension, but in two (funnily enough).
<!-- See the [`torch.nn.Conv1d` docs](https://pytorch.org/docs/stable/generated/torch.nn.Conv1d.html) -->

![](2d-conv.png)

- Everthing else proceeds in the same way as with the 1D case.
- See the [`torch.nn.Conv2d` docs](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html).
- As with Linear layers, Conv2d layers also have non-linear activations applied to them.
<!-- ## 2D convolutional layer {.smaller} -->

<!-- - Same idea as in on dimension, but in two (funnily enough). -->

## Typical CNN overview {.smaller}
<!-- ![](2d-conv.png) -->

::: {layout="[ 0.5, 0.5 ]"}
<!-- - Everthing else proceeds in the same way as with the 1D case. -->
<!-- - See the [`torch.nn.Conv2d` docs](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html). -->
<!-- - As with Linear layers, Conv2d layers also have non-linear activations applied to them. -->

![](https://miro.medium.com/v2/resize:fit:1162/format:webp/1*tvwYybdIwvoOs0DuUEJJTg.png)

- Series of conv layers extract features from the inputs.
- Often called an encoder.
- Adaptive pooling layer:
- Image-like objects $\to$ vectors.
- Standardises size.
- [``torch.nn.AdaptiveAvgPool2d``](https://pytorch.org/docs/stable/generated/torch.nn.AdaptiveAvgPool2d.html)
- [``torch.nn.AdaptiveMaxPool2d``](https://pytorch.org/docs/stable/generated/torch.nn.AdaptiveMaxPool2d.html)
- Classification (or regression) head.
<!-- ## Typical CNN overview {.smaller} -->

:::
<!-- ::: {layout="[ 0.5, 0.5 ]"} -->

- For common CNN architectures see [``torchvision.models`` docs](https://pytorch.org/vision/stable/models.html).
<!-- ![](https://miro.medium.com/v2/resize:fit:1162/format:webp/1*tvwYybdIwvoOs0DuUEJJTg.png) -->

::: {.attribution}
Image source: [medium.com - binary image classifier cnn using tensorflow](https://medium.com/techiepedia/binary-image-classifier-cnn-using-tensorflow-a3f5d6746697)
:::
<!-- - Series of conv layers extract features from the inputs. -->
<!-- - Often called an encoder. -->
<!-- - Adaptive pooling layer: -->
<!-- - Image-like objects $\to$ vectors. -->
<!-- - Standardises size. -->
<!-- - [``torch.nn.AdaptiveAvgPool2d``](https://pytorch.org/docs/stable/generated/torch.nn.AdaptiveAvgPool2d.html) -->
<!-- - [``torch.nn.AdaptiveMaxPool2d``](https://pytorch.org/docs/stable/generated/torch.nn.AdaptiveMaxPool2d.html) -->
<!-- - Classification (or regression) head. -->

<!-- ::: -->

# Exercises
<!-- - For common CNN architectures see [``torchvision.models`` docs](https://pytorch.org/vision/stable/models.html). -->

## Exercise 1 -- classification
<!-- ::: {.attribution} -->
<!-- Image source: [medium.com - binary image classifier cnn using tensorflow](https://medium.com/techiepedia/binary-image-classifier-cnn-using-tensorflow-a3f5d6746697) -->
<!-- ::: -->

### MNIST hand-written digits.

::: {layout="[ 0.5, 0.5 ]"}
<!-- # Exercises -->

![](https://i.ytimg.com/vi/0QI3xgXuB-Q/hqdefault.jpg)
<!-- ## Exercise 1 -- classification -->

- In this exercise we'll train a CNN to classify hand-written digits in the MNIST dataset.
- See the [MNIST database wiki](https://en.wikipedia.org/wiki/MNIST_database) for more details.
<!-- ### MNIST hand-written digits. -->

:::
<!-- ::: {layout="[ 0.5, 0.5 ]"} -->

::: {.attribution}
Image source: [npmjs.com](https://www.npmjs.com/package/mnist)
:::
<!-- ![](https://i.ytimg.com/vi/0QI3xgXuB-Q/hqdefault.jpg) -->

<!-- - In this exercise we'll train a CNN to classify hand-written digits in the MNIST dataset. -->
<!-- - See the [MNIST database wiki](https://en.wikipedia.org/wiki/MNIST_database) for more details. -->

<!-- ::: -->

## Exercise 2---regression
### Random ellipse problem
<!-- ::: {.attribution} -->
<!-- Image source: [npmjs.com](https://www.npmjs.com/package/mnist) -->
<!-- ::: -->

- In this exercise, we'll train a CNN to estimate the centre $(x_{\text{c}}, y_{\text{c}})$ and the $x$ and $y$ radii of an ellipse defined by
$$
\frac{(x - x_{\text{c}})^{2}}{r_{x}^{2}} + \frac{(y - y_{\text{c}})^{2}}{r_{y}^{2}} = 1
$$

- The ellipse, and its background, will have random colours chosen uniformly on $\left[0,\ 255\right]^{3}$.
- In short, the model must learn to estimate $x_{\text{c}}$, $y_{\text{c}}$, $r_{x}$ and $r_{y}$.

<!-- ## Exercise 2---regression -->
<!-- ### Random ellipse problem -->

<!-- - In this exercise, we'll train a CNN to estimate the centre $(x_{\text{c}}, y_{\text{c}})$ and the $x$ and $y$ radii of an ellipse defined by -->
<!-- $$ -->
<!-- \frac{(x - x_{\text{c}})^{2}}{r_{x}^{2}} + \frac{(y - y_{\text{c}})^{2}}{r_{y}^{2}} = 1 -->
<!-- $$ -->

<!-- - The ellipse, and its background, will have random colours chosen uniformly on $\left[0,\ 255\right]^{3}$. -->
<!-- - In short, the model must learn to estimate $x_{\text{c}}$, $y_{\text{c}}$, $r_{x}$ and $r_{y}$. -->


<!-- # Further information -->
Expand Down
Loading