Cambridge-ICCS · surbhigoel77 · Jul 1, 2024 · Jul 1, 2024 · Jul 1, 2024 · Jul 2, 2024
diff --git a/slides/slides.qmd b/slides/slides.qmd
@@ -42,18 +42,14 @@ revealjs-plugins:
 * 10:30-11:00 - Coffee
 * 11:00-12:00 - Teaching/Code-along
 
-Lunch
+Lunch @ Churchill college 
 
 * 12:00 - 13:30
 
-::: {style="color: turquoise;"}
-Helping Today:
-
-* Person 1 - Cambridge RSE
-:::
 :::
 ::::
 
+
 ## Material {.smaller}
 
 These slides can be viewed at:
@@ -74,6 +70,26 @@ Based on the workshop developed by [Jack Atkinson](https://orcid.org/0000-0001-5
 V1.0 released and JOSE paper accepted: 
 
   - [@atkinson2024practical]
+
+## Learning objectives {.smaller}
+The key learning objective from this workshop could be simply summarised as:
+*Provide the ability to develop ML models in PyTorch.*
+
+Specifically: 
+
+- provide an understanding of the structure of a PyTorch model and ML pipeline,
+- introduce the different functionalities PyTorch might provide,
+- encourage good research software engineering (RSE) practice, and
+- exercise careful consideration and understanding of data used for training ML models.
+
+\
+\
+With regards to specific ML content, we cover:
+
+- using ML for both classification and regression,
+- artificial neural networks (ANNs) <!-- and convolutional neural networks (CNNs) -->
+- treatment of tabular data <!-- and image data -->
+
 <!--
 ## NCAS School (rough) Schedule {.smaller}
 
@@ -113,14 +129,29 @@ Helping Today:
 
 # Part 1: Neural-network basics -- and fun applications.
 
+## Fitting a straight line I {.smaller}
+
+- Consider the data:
+
+| $x_{i}$  | $y_{i}$ |
+|:--------:|:-------:|
+| 1.0      | 2.1     |
+| 2.0      | 3.9     |
+| 3.0      | 6.2     |
 
-## Stochastic gradient descent (SGD)
+- Wish to fit a function to the above data. 
+$$f(x) = mx + c$$
 
-- Generally speaking, most neural networks are fit/trained using SGD (or some variant of it).
+- When fitting a function, we are essentially creating a model, $f$, which describes some data, $y$.
+
+## Fitting a straight line II - SGD
+
+- Simple problems like the previous can be solved analytically. 
+- Generally speaking, most neural networks are fit/trained using Stochastic Gradient Descent (SGD) - or some variant of it. 
 - To understand how one might fit a function with SGD, let's start with a straight line: $$y=mx+c$$
 
 
-## Fitting a straight line with SGD I {.smaller}
+## Fitting a straight line III - SGD {.smaller}
 
 - **Question**---when we a differentiate a function, what do we get?
 
@@ -137,7 +168,7 @@ $$\frac{dy}{dx} = m$$
 :::
 
 
-## Fitting a straight line with SGD II {.smaller}
+## Fitting a straight line IV - SGD {.smaller}
 
 - **Answer**---a function's derivative gives a _vector_ which points in the direction of _steepest ascent_.
 
@@ -164,10 +195,9 @@ $$-\frac{dy}{dx}$$
 :::
 
 
-## Fitting a straight line with SGD III {.smaller}
+## Fitting a straight line V - Cost fn {.smaller}
 
-- When fitting a function, we are essentially creating a model, $f$, which describes some data, $y$.
-- We therefore need a way of measuring how well a model's predictions match our observations.
+- We need a way of measuring how well a model's predictions match our observations.
 
 
 ::: {.fragment .fade-in}
@@ -201,7 +231,7 @@ $$L_{\text{MSE}} = \frac{1}{n}\sum_{i=1}^{n}\left(y_{i} - f(x_{i})\right)^{2}$$
 :::
 
 
-## Fitting a straight line with SGD IV {.smaller}
+## Fitting a straight line VI {.smaller}
 
 :::: {.columns}
 ::: {.column width="45%"}
@@ -210,18 +240,18 @@ $$L_{\text{MSE}} = \frac{1}{n}\sum_{i=1}^{n}\left(y_{i} - f(x_{i})\right)^{2}$$
 
 - Data: \ $\{x_{i}, y_{i}\}$
 
-- Loss: \ $\frac{1}{n}\sum_{i=1}^{n}(y_{i} - x_{i})^{2}$
-
-:::
-::: {.column width="55%"}
-
-$$
+- Loss fn: 
+- $$
 \begin{align}
 L_{\text{MSE}} &= \frac{1}{n}\sum_{i=1}^{n}(y_{i} - f(x_{i}))^{2}\\
-    &= \frac{1}{n}\sum_{i=1}^{n}(y_{i} - mx_{i} + c)^{2}
+    &= \frac{1}{n}\sum_{i=1}^{n}(y_{i} - mx_{i} - c)^{2}
 \end{align}
 $$
+<!-- - Loss: \$\frac{1}{n}\sum_{i=1}^{n}(y_{i} - x_{i})^{2}$ -->
 
+:::
+::: {.column width="55%"}
+![](https://images.squarespace-cdn.com/content/v1/5acbdd3a25bf024c12f4c8b4/1600368657769-5BJU5FK86VZ6UXZGRC1M/Mean+Squared+Error.png?format=2500w){width=65%}
 :::
 ::::
 
@@ -233,7 +263,7 @@ $$
 :::: {#placeholder}
 ::::
 
-$$m_{n + 1} = m_{n} - \frac{dL}{dm} \cdot l_{r}$$
+$$m_{t + 1} = m_{t} - \frac{dL}{dm} \cdot l_{r}$$
 
 :::: {#placeholder}
 ::::
@@ -249,6 +279,24 @@ $$c_{n + 1} = c_{n} - \frac{dL}{dc} \cdot l_{r}$$
 :::
 
 
+## Cost function #1
+
+![](https://miro.medium.com/v2/resize:fit:4800/format:webp/0*fcNdB994NRWt_XZ2.gif){}
+
+::: {.attribution}
+Image source: [Coursera](https://www.coursera.org/specializations/machine-learning-introduction/?utm_medium=coursera&utm_source=home-page&utm_campaign=mlslaunch2022IN)
+:::
+
+
+## Cost function #2
+
+![](https://miro.medium.com/v2/resize:fit:4800/format:webp/1*8Lp1VXMApbAJlfXy2zq9MA.gif){fig-align="center"}
+
+::: {.attribution}
+Image source: [Coursera](https://www.coursera.org/specializations/machine-learning-introduction/?utm_medium=coursera&utm_source=home-page&utm_campaign=mlslaunch2022IN)
+:::
+
+
 ## Quick recap {.smaller}
 
 To fit a model we need:
@@ -285,7 +333,7 @@ $$a_{l+1} = \sigma \left( W_{l}a_{l} + b_{l} \right)$$
 :::
 ::::
 
-![](https://3b1b-posts.us-east-1.linodeobjects.com//images/topics/neural-networks.jpg){style="border-radius: 50%;" .absolute top=35% left=42.5% width=65%}
+![](https://web.archive.org/web/20240102183723if_/https://3b1b-posts.us-east-1.linodeobjects.com/images/topics/neural-networks.jpg){style="border-radius: 50%;" .absolute top=35% left=42.5% width=65%}
 
 ::: {.attribution}
 Image source: [3Blue1Brown](https://www.3blue1brown.com/topics/neural-networks)
@@ -313,6 +361,12 @@ Image source: [3Blue1Brown](https://www.3blue1brown.com/topics/neural-networks)
 - See the PyTorch website: [https://pytorch.org/](https://pytorch.org/)
 
 
+# Other resources
+
+- [coursera.org/machine-learning-introduction](https://www.coursera.org/specializations/machine-learning-introduction/?utm_medium=coursera&utm_source=home-page&utm_campaign=mlslaunch2022IN)
+- [uvadlc](https://uvadlc-notebooks.readthedocs.io/en/latest/)
+- [3Blue1Brown](https://www.3blue1brown.com/topics/neural-networks)
+
 # Exercises
 
 
@@ -337,124 +391,127 @@ Image source: [Palmer Penguins by Alison Horst](https://allisonhorst.github.io/p
 - [https://github.com/allisonhorst/palmerpenguins](https://github.com/allisonhorst/palmerpenguins)
 
 
-# Part 2: Fun with CNNs
 
 
-## Convolutional neural networks (CNNs): why? {.smaller}
 
-Advantages over simple ANNs:
+<!-- # Part 2: Fun with CNNs -->
 
-- They require far fewer parameters per layer.
-  - The forward pass of a conv layer involves running a filter of fixed size over the inputs.
-  - The number of parameters per layer _does not_ depend on the input size.
-- They are a much more natural choice of function for *image-like* data:
 
-:::: {.columns}
-::: {.column width=10%}
-:::
-::: {.column width=35%}
+<!-- ## Convolutional neural networks (CNNs): why? {.smaller} -->
 
-![](https://machinelearningmastery.com/wp-content/uploads/2019/03/Plot-of-the-First-Nine-Photos-of-Dogs-in-the-Dogs-vs-Cats-Dataset.png)
+<!-- Advantages over simple ANNs: -->
 
-:::
-::: {.column width=10%}
-:::
-::: {.column width=35%}
+<!-- - They require far fewer parameters per layer. -->
+<!--   - The forward pass of a conv layer involves running a filter of fixed size over the inputs. -->
+<!--   - The number of parameters per layer _does not_ depend on the input size. -->
+<!-- - They are a much more natural choice of function for *image-like* data: -->
 
-![](https://machinelearningmastery.com/wp-content/uploads/2019/03/Plot-of-the-First-Nine-Photos-of-Cats-in-the-Dogs-vs-Cats-Dataset.png)
+<!-- :::: {.columns} -->
+<!-- ::: {.column width=10%} -->
+<!-- ::: -->
+<!-- ::: {.column width=35%} -->
 
-:::
-::::
+<!-- ![](https://machinelearningmastery.com/wp-content/uploads/2019/03/Plot-of-the-First-Nine-Photos-of-Dogs-in-the-Dogs-vs-Cats-Dataset.png) -->
 
-::: {.attribution}
-Image source: [Machine Learning Mastery](https://machinelearningmastery.com/how-to-develop-a-convolutional-neural-network-to-classify-photos-of-dogs-and-cats/)
-:::
+<!-- ::: -->
+<!-- ::: {.column width=10%} -->
+<!-- ::: -->
+<!-- ::: {.column width=35%} -->
 
+<!-- ![](https://machinelearningmastery.com/wp-content/uploads/2019/03/Plot-of-the-First-Nine-Photos-of-Cats-in-the-Dogs-vs-Cats-Dataset.png) -->
 
-## Convolutional neural networks (CNNs): why? {.smaller}
+<!-- ::: -->
+<!-- :::: -->
 
-Some other points:
+<!-- ::: {.attribution} -->
+<!-- Image source: [Machine Learning Mastery](https://machinelearningmastery.com/how-to-develop-a-convolutional-neural-network-to-classify-photos-of-dogs-and-cats/) -->
+<!-- ::: -->
 
-- Convolutional layers are translationally invariant:
-  - i.e. they don't care _where_ the "dog" is in the image.
-- Convolutional layers are _not_ rotationally invariant.
-  - e.g. a model trained to detect correctly-oriented human faces will likely fail on upside-down images
-  - We can address this with data augmentation (explored in exercises).
 
+<!-- ## Convolutional neural networks (CNNs): why? {.smaller} -->
 
-## What is a (1D) convolutional layer? {.smaller}
+<!-- Some other points: -->
 
-![](1d-conv.png)
+<!-- - Convolutional layers are translationally invariant: -->
+<!--   - i.e. they don't care _where_ the "dog" is in the image. -->
+<!-- - Convolutional layers are _not_ rotationally invariant. -->
+<!--   - e.g. a model trained to detect correctly-oriented human faces will likely fail on upside-down images -->
+<!--   - We can address this with data augmentation (explored in exercises). -->
 
-See the [`torch.nn.Conv1d` docs](https://pytorch.org/docs/stable/generated/torch.nn.Conv1d.html)
 
+<!-- ## What is a (1D) convolutional layer? {.smaller} -->
 
-## 2D convolutional layer {.smaller}
+<!-- ![](1d-conv.png) -->
 
-- Same idea as in on dimension, but in two (funnily enough).
+<!-- See the [`torch.nn.Conv1d` docs](https://pytorch.org/docs/stable/generated/torch.nn.Conv1d.html) -->
 
-![](2d-conv.png)
 
-- Everthing else proceeds in the same way as with the 1D case.
-- See the [`torch.nn.Conv2d` docs](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html).
-- As with Linear layers, Conv2d layers also have non-linear activations applied to them.
+<!-- ## 2D convolutional layer {.smaller} -->
 
+<!-- - Same idea as in on dimension, but in two (funnily enough). -->
 
-## Typical CNN overview {.smaller}
+<!-- ![](2d-conv.png) -->
 
-::: {layout="[ 0.5, 0.5 ]"}
+<!-- - Everthing else proceeds in the same way as with the 1D case. -->
+<!-- - See the [`torch.nn.Conv2d` docs](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html). -->
+<!-- - As with Linear layers, Conv2d layers also have non-linear activations applied to them. -->
 
-![](https://miro.medium.com/v2/resize:fit:1162/format:webp/1*tvwYybdIwvoOs0DuUEJJTg.png)
 
-- Series of conv layers extract features from the inputs.
-  - Often called an encoder.
-- Adaptive pooling layer:
-  - Image-like objects $\to$ vectors.
-  - Standardises size.
-  - [``torch.nn.AdaptiveAvgPool2d``](https://pytorch.org/docs/stable/generated/torch.nn.AdaptiveAvgPool2d.html)
-  - [``torch.nn.AdaptiveMaxPool2d``](https://pytorch.org/docs/stable/generated/torch.nn.AdaptiveMaxPool2d.html)
-- Classification (or regression) head.
+<!-- ## Typical CNN overview {.smaller} -->
 
-:::
+<!-- ::: {layout="[ 0.5, 0.5 ]"} -->
 
-- For common CNN architectures see [``torchvision.models`` docs](https://pytorch.org/vision/stable/models.html).
+<!-- ![](https://miro.medium.com/v2/resize:fit:1162/format:webp/1*tvwYybdIwvoOs0DuUEJJTg.png) -->
 
-::: {.attribution}
-Image source: [medium.com - binary image classifier cnn using tensorflow](https://medium.com/techiepedia/binary-image-classifier-cnn-using-tensorflow-a3f5d6746697)
-:::
+<!-- - Series of conv layers extract features from the inputs. -->
+<!--   - Often called an encoder. -->
+<!-- - Adaptive pooling layer: -->
+<!--   - Image-like objects $\to$ vectors. -->
+<!--   - Standardises size. -->
+<!--   - [``torch.nn.AdaptiveAvgPool2d``](https://pytorch.org/docs/stable/generated/torch.nn.AdaptiveAvgPool2d.html) -->
+<!--   - [``torch.nn.AdaptiveMaxPool2d``](https://pytorch.org/docs/stable/generated/torch.nn.AdaptiveMaxPool2d.html) -->
+<!-- - Classification (or regression) head. -->
 
+<!-- ::: -->
 
-# Exercises
+<!-- - For common CNN architectures see [``torchvision.models`` docs](https://pytorch.org/vision/stable/models.html). -->
 
-## Exercise 1 -- classification
+<!-- ::: {.attribution} -->
+<!-- Image source: [medium.com - binary image classifier cnn using tensorflow](https://medium.com/techiepedia/binary-image-classifier-cnn-using-tensorflow-a3f5d6746697) -->
+<!-- ::: -->
 
-### MNIST hand-written digits.
 
-::: {layout="[ 0.5, 0.5 ]"}
+<!-- # Exercises -->
 
-![](https://i.ytimg.com/vi/0QI3xgXuB-Q/hqdefault.jpg)
+<!-- ## Exercise 1 -- classification -->
 
-- In this exercise we'll train a CNN to classify hand-written digits in the MNIST dataset.
-- See the [MNIST database wiki](https://en.wikipedia.org/wiki/MNIST_database) for more details.
+<!-- ### MNIST hand-written digits. -->
 
-:::
+<!-- ::: {layout="[ 0.5, 0.5 ]"} -->
 
-::: {.attribution}
-Image source: [npmjs.com](https://www.npmjs.com/package/mnist)
-:::
+<!-- ![](https://i.ytimg.com/vi/0QI3xgXuB-Q/hqdefault.jpg) -->
 
+<!-- - In this exercise we'll train a CNN to classify hand-written digits in the MNIST dataset. -->
+<!-- - See the [MNIST database wiki](https://en.wikipedia.org/wiki/MNIST_database) for more details. -->
 
+<!-- ::: -->
 
-## Exercise 2---regression
-### Random ellipse problem
+<!-- ::: {.attribution} -->
+<!-- Image source: [npmjs.com](https://www.npmjs.com/package/mnist) -->
+<!-- ::: -->
 
-- In this exercise, we'll train a CNN to estimate the centre $(x_{\text{c}}, y_{\text{c}})$ and the $x$ and $y$ radii of an ellipse defined by
-$$
-\frac{(x - x_{\text{c}})^{2}}{r_{x}^{2}} + \frac{(y - y_{\text{c}})^{2}}{r_{y}^{2}} = 1
-$$
 
-- The ellipse, and its background, will have random colours chosen uniformly on $\left[0,\ 255\right]^{3}$.
-- In short, the model must learn to estimate $x_{\text{c}}$, $y_{\text{c}}$, $r_{x}$ and $r_{y}$.
+
+<!-- ## Exercise 2---regression -->
+<!-- ### Random ellipse problem -->
+
+<!-- - In this exercise, we'll train a CNN to estimate the centre $(x_{\text{c}}, y_{\text{c}})$ and the $x$ and $y$ radii of an ellipse defined by -->
+<!-- $$ -->
+<!-- \frac{(x - x_{\text{c}})^{2}}{r_{x}^{2}} + \frac{(y - y_{\text{c}})^{2}}{r_{y}^{2}} = 1 -->
+<!-- $$ -->
+
+<!-- - The ellipse, and its background, will have random colours chosen uniformly on $\left[0,\ 255\right]^{3}$. -->
+<!-- - In short, the model must learn to estimate $x_{\text{c}}$, $y_{\text{c}}$, $r_{x}$ and $r_{y}$. -->
 
 
 <!-- # Further information -->