diff --git a/README.md b/README.md index 0abb846..52e590b 100644 --- a/README.md +++ b/README.md @@ -6,6 +6,7 @@  [![CC BY-NC-SA 4.0][cc-by-nc-sa-shield]][cc-by-nc-sa] +[](https://mybinder.org/v2/gh/Cambridge-ICCS/ml-training-material/main) This repository contains documentation, resources, and code for the Introduction to Machine Learning with PyTorch session designed and delivered by [Jack Atkinson](https://jackatkinson.net/) ([**@jatkinson1000**](https://github.com/jatkinson1000)) @@ -23,6 +24,7 @@ A website for this workshop can be found at [https://cambridge-iccs.github.io/ml - [Preparation and prerequisites](#preparation-and-prerequisites) - [Installation and setup](#installation-and-setup) - [License information](#license) +- [Contribution Guidelines and Support](#contribution-guidelines-and-support) ## Learning Objectives @@ -70,13 +72,6 @@ These are for recapping after the course in case you missed anything, and contai [linted](https://docs.pylint.org/intro.html), and conforming to the [black](https://black.readthedocs.io/en/stable/) code style. -If you were working on Colab you can open the worked solutions using the following links: - -* [Exercise 01](https://colab.research.google.com/github/Cambridge-ICCS/ml-training-material/blob/colab/worked-solutions/01_penguin_classification_solutions.ipynb) -* [Exercise 02](https://colab.research.google.com/github/Cambridge-ICCS/ml-training-material/blob/colab/worked-solutions/02_penguin_regression_solutions.ipynb) -* [Exercise 03](https://colab.research.google.com/github/Cambridge-ICCS/ml-training-material/blob/colab/worked-solutions/03_mnist_classification_solutions.ipynb) -* [Exercise 04](https://colab.research.google.com/github/Cambridge-ICCS/ml-training-material/blob/colab/worked-solutions/04_ellipse_regression_solutions.ipynb) - ## Preparation and prerequisites @@ -136,17 +131,18 @@ us before a training session. ## Installation and setup -There are two options for participating in this workshop for which instructions are provided below: +There are three options for participating in this workshop for which instructions are provided below: * via a [local install](#local-install) * on [Google Colab](#google-colab) +* on [binder](#binder) We recommend the [local install](#local-install) approach, especially if you forked the repository, as it is the easiest way to keep a copy of your work and push back to GitHub. However, if you experience issues with the installation process or are unfamiliar with the terminal/installation process there is the option to run the notebooks in -[Google Colab](#google-colab). +[Google Colab](#google-colab) or on [binder](#binder). ### Local Install @@ -219,18 +215,31 @@ python -m ipykernel install --user --name=MLvenv ### Google Colab -To run the notebooks in Google Colab click the following links for each of the exercises: +Running on Colab is useful as it allows you to access GPU resources. +To launch the notebooks in Google Colab click the following links for each of the exercises: -* [Exercise 01](https://colab.research.google.com/github/Cambridge-ICCS/ml-training-material/blob/colab/exercises/01_penguin_classification.ipynb) -* [Exercise 02](https://colab.research.google.com/github/Cambridge-ICCS/ml-training-material/blob/colab/exercises/02_penguin_regression.ipynb) -* [Exercise 03](https://colab.research.google.com/github/Cambridge-ICCS/ml-training-material/blob/colab/exercises/03_mnist_classification.ipynb) -* [Exercise 04](https://colab.research.google.com/github/Cambridge-ICCS/ml-training-material/blob/colab/exercises/04_ellipse_regression.ipynb) +* [Exercise 01](https://colab.research.google.com/github/Cambridge-ICCS/ml-training-material/blob/colab/exercises/01_penguin_classification.ipynb) - [Worked Solution 01](https://colab.research.google.com/github/Cambridge-ICCS/ml-training-material/blob/colab/worked-solutions/01_penguin_classification_solutions.ipynb) +* [Exercise 02](https://colab.research.google.com/github/Cambridge-ICCS/ml-training-material/blob/colab/exercises/02_penguin_regression.ipynb) - [Worked Solution 02](https://colab.research.google.com/github/Cambridge-ICCS/ml-training-material/blob/colab/worked-solutions/02_penguin_regression_solutions.ipynb) +* [Exercise 03](https://colab.research.google.com/github/Cambridge-ICCS/ml-training-material/blob/colab/exercises/03_mnist_classification.ipynb) - [Worked Solution 03](https://colab.research.google.com/github/Cambridge-ICCS/ml-training-material/blob/colab/worked-solutions/03_mnist_classification_solutions.ipynb) +* [Exercise 04](https://colab.research.google.com/github/Cambridge-ICCS/ml-training-material/blob/colab/exercises/04_ellipse_regression.ipynb) - [Worked Solution 04](https://colab.research.google.com/github/Cambridge-ICCS/ml-training-material/blob/colab/worked-solutions/04_ellipse_regression_solutions.ipynb) _Notes:_ * _Running in Google Colab requires you to have a Google account._ * _If you leave a Colab session your work will be lost, so be careful to save any work you want to keep._ +### binder + +If you cannot operate using a local install, and do not wish to sign up for a Google account, +the repository can be launched +[on binder](https://mybinder.org/v2/gh/Cambridge-ICCS/ml-training-material/main). + +_Notes:_ +* _If you leave a binder session your work will be lost, so be careful to save any work + you want to keep_ +* _Due to the limited resources provided by binder you will struggle to run training in + exercises 3 and 4._ + ## License @@ -244,3 +253,24 @@ The teaching materials are licensed under a [cc-by-nc-sa-shield]: https://img.shields.io/badge/License-CC%20BY--NC--SA%204.0-lightgrey.svg [![CC BY-NC-SA 4.0][cc-by-nc-sa-image]][cc-by-nc-sa] + +## Contribution Guidelines and Support + +If you spot an issue with the materials please let us know by +[opening an issue](https://github.com/Cambridge-ICCS/ml-training-material/issues/new/choose) +here on GitHub clearly describing the problem. + +If you are able to fix an issue that you spot, or an +[existing open issue](https://github.com/Cambridge-ICCS/ml-training-material/issues/new/choose) +please get in touch by commenting on the issue thread. + +Contributions from the community are welcome. +To contribute back to the repository please first +[fork it](https://github.com/Cambridge-ICCS/ml-training-material/fork), +make the neccessary changes to fix the problem, and then open a pull request back to +this repository clerly describing the changes you have made. +We will then preform a review and merge once ready. + +If you would like support using these materials, adapting them to your needs, or +delivering them please get in touch either via GitHub or via +[ICCS](https://github.com/Cambridge-ICCS). diff --git a/exercises/01_penguin_classification.ipynb b/exercises/01_penguin_classification.ipynb index c807dab..4d8b931 100644 --- a/exercises/01_penguin_classification.ipynb +++ b/exercises/01_penguin_classification.ipynb @@ -105,7 +105,9 @@ " train=True,\n", ")\n", "\n", + "\n", "for features, target in data_set:\n", + " # print the features and targets here\n", " pass" ] }, @@ -124,7 +126,7 @@ "source": [ "### Task 4: Applying transforms to the data\n", "\n", - "A common way of transforming inputs to neural networks is to apply a series of transforms using ``torchvision.transforms.Compose``. The ``Compose`` object takes a list of callable objects and applies them to the incoming data.\n", + "A common way of transforming inputs to neural networks is to apply a series of transforms using ``torchvision.transforms.Compose``. The [``Compose``](https://pytorch.org/vision/stable/generated/torchvision.transforms.Compose.html) object takes a list of callable objects (i.e., functions) and applies them to the incoming data.\n", "\n", "These transforms can be very useful for mapping between file paths and tensors of images, etc.\n", "\n", @@ -141,8 +143,12 @@ "outputs": [], "source": [ "from torchvision.transforms import Compose\n", + "# import some useful functions here, see https://pytorch.org/docs/stable/torch.html\n", + "# where `tensor` and `eye` are used for constructing tensors,\n", + "# and using a lower-precision float32 is advised for performance\n", + "from torch import tensor, eye, float32 \n", "\n", - "# Apply the transforms we need to the PenguinDataset to get out inputs\n", + "# Apply the transforms we need to the PenguinDataset to get out input\n", "# targets as Tensors." ] }, @@ -154,7 +160,7 @@ "\n", "- Once we have created a ``Dataset`` object, we wrap it in a ``DataLoader``.\n", " - The ``DataLoader`` object allows us to put our inputs and targets in mini-batches, which makes for more efficient training.\n", - " - Note: rather than supplying one input-target pair to the model at a time, we supply \"mini-batches\" of these data at once.\n", + " - Note: rather than supplying one input-target pair to the model at a time, we supply \"mini-batches\" of these data at once (typically a small power of 2, like 16 or 32).\n", " - The number of items we supply at once is called the batch size.\n", " - The ``DataLoader`` can also randomly shuffle the data each epoch (when training).\n", " - It allows us to load different mini-batches in parallel, which can be very useful for larger datasets and images that can't all fit in memory at once.\n", diff --git a/setup.py b/setup.py new file mode 100644 index 0000000..93c246e --- /dev/null +++ b/setup.py @@ -0,0 +1,6 @@ +#!usr/bin/env python + +from setuptools import setup + +if __name__ == "__main__": + setup() diff --git a/slides/index.html b/slides/index.html index f516c17..e4f46fb 100644 --- a/slides/index.html +++ b/slides/index.html @@ -86,10 +86,12 @@
We recommend the local install approach, especially if you forked the repository, as it is the easiest way to keep a copy of your work and push back to github.
@@ -183,7 +186,8 @@python -m ipykernel install --user --name=MLvenv
To run the notebooks in Google Colab click the following links for each of the exercises:
+Running on Colab is useful as it allows you to access GPU resources.
+ To launch the notebooks in Google Colab click the following links for each of the exercises:
To run the notebooks in binder click the following link:
+ + +Notes:
+
Worked solutions for all of the exercises can be found in the
Unless participating via Colab you will be expected to know how to:
+Unless participating via Colab or binder you will be expected to know how to:
The teaching materials are licensed under CC BY-NC-SA 4.0.
If you spot an issue with the materials please let us know by opening an issue on GitHub clearly describing the problem.
+If you are able to fix an issue that you spot, or an existing open issue please get in touch by commenting on the issue thread.
+Contributions from the community are welcome. To contribute back to the repository please first fork it, make the neccessary changes to fix the problem, and then open a pull request back to this repository clerly describing the changes you have made. We will then preform a review and merge once ready.
+If you would like support using these materials, adapting them to your needs, or delivering them please get in touch either via GitHub or via ICCS.
+ diff --git a/src/ml_workshop/_penguins.py b/src/ml_workshop/_penguins.py index a3a5d79..d0fbd0f 100644 --- a/src/ml_workshop/_penguins.py +++ b/src/ml_workshop/_penguins.py @@ -1,4 +1,5 @@ """Penguins dataset.""" + from typing import Optional, List, Dict, Tuple, Any from torch.utils.data import Dataset @@ -17,9 +18,9 @@ class PenguinDataset(Dataset): Parameters ---------- - input_keys : Sequence[str] + input_keys : List[str] The column titles to use in the input feature vectors. - target_keys : Sequnce[str] + target_keys : List[str] The column titles to use in the target feature vectors. train : bool If ``True``, this object will serve as the training set, and if @@ -39,7 +40,7 @@ class PenguinDataset(Dataset): def __init__( self, input_keys: List[str], - target_keys: str, + target_keys: List[str], train: bool, x_tfms: Optional[Compose] = None, y_tfms: Optional[Compose] = None, @@ -109,6 +110,7 @@ def _load_penguin_data() -> DataFrame: .sort_values(by=sorted(data.keys())) .reset_index(drop=True) ) + # Transform the sex field into a float, with male represented by 1.0, female by 0.0 data.sex = (data.sex == "male").astype(float) return data diff --git a/worked-solutions/01_penguin_classification_solutions.ipynb b/worked-solutions/01_penguin_classification_solutions.ipynb index ef9a308..c8666eb 100644 --- a/worked-solutions/01_penguin_classification_solutions.ipynb +++ b/worked-solutions/01_penguin_classification_solutions.ipynb @@ -214,7 +214,7 @@ "source": [ "### Task 4: Applying transforms to the data\n", "\n", - "A common way of transforming inputs to neural networks is to apply a series of transforms using ``torchvision.transforms.Compose``. The ``Compose`` object takes a list of callable objects and applies them to the incoming data.\n", + "A common way of transforming inputs to neural networks is to apply a series of transforms using ``torchvision.transforms.Compose``. The [``Compose``](https://pytorch.org/vision/stable/generated/torchvision.transforms.Compose.html) object takes a list of callable objects and applies them to the incoming data.\n", "\n", "These transforms can be very useful for mapping between file paths and tensors of images, etc.\n", "\n", @@ -242,11 +242,14 @@ } ], "source": [ - "from torch import tensor, float32, eye\n", "from torchvision.transforms import Compose\n", + "# import some useful functions here, see https://pytorch.org/docs/stable/torch.html\n", + "# where `tensor` and `eye` are used for constructing tensors,\n", + "# and using a lower-precision float32 is advised for performance\n", + "from torch import tensor, float32, eye\n", "\n", "\n", - "# Apply the transforms we need to the PenguinDataset to get out inputs\n", + "# Apply the transforms we need to the PenguinDataset to get out input\n", "# targets as Tensors.\n", "\n", "\n", @@ -321,7 +324,7 @@ "\n", "- Once we have created a ``Dataset`` object, we wrap it in a ``DataLoader``.\n", " - The ``DataLoader`` object allows us to put our inputs and targets in mini-batches, which makes for more efficient training.\n", - " - Note: rather than supplying one input-target pair to the model at a time, we supply \"mini-batches\" of these data at once.\n", + " - Note: rather than supplying one input-target pair to the model at a time, we supply \"mini-batches\" of these data at once (typically a small power of 2, like 16 or 32).\n", " - The number of items we supply at once is called the batch size.\n", " - The ``DataLoader`` can also randomly shuffle the data each epoch (when training).\n", " - It allows us to load different mini-batches in parallel, which can be very useful for larger datasets and images that can't all fit in memory at once.\n",