Skip to content

Update markdown instructions for Penguin classification exercise (1) #66

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 19 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 12 additions & 9 deletions exercises/01_penguin_classification.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -23,20 +23,23 @@
"### Task 1: look at the data\n",
"In the following code block, we import the ``load_penguins`` function from the ``palmerpenguins`` package.\n",
"\n",
"- Call this function, which returns a single object, and assign it to the variable ``data``.\n",
" - Print ``data`` and recognise that ``load_penguins`` has returned a ``pandas.DataFrame``.\n",
"- Consider which features it might make sense to use in order to classify the species of the penguins.\n",
" - You can print the column titles using ``pd.DataFrame.keys()``\n",
" - You can also obtain useful information using ``pd.DataFrame.Series.describe()``"
"- Call this function, which returns a single object in the form of a ``pandas.DataFrame``, and assign it to the variable ``data``.\n",
" - Print ``data`` and recognise that ``load_penguins`` has returned the dataframe.\n",
"- Analyse which features it might make sense to use in order to classify the species of the penguins.\n",
" - You can print the column names using ``pd.DataFrame.keys()``\n",
" - You can also obtain useful statical information on the dataset using ``pd.DataFrame.Series.describe()``"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

statistical

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 'Consider' is probably fine here, but either is fine.

]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"from palmerpenguins import load_penguins"
"from palmerpenguins import load_penguins\n",
"\n",
"# Load the penguin data\n",
"penguins = load_penguins()\n"
]
},
{
Expand Down Expand Up @@ -402,7 +405,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"display_name": ".venv",
"language": "python",
"name": "python3"
},
Expand All @@ -416,7 +419,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.4"
"version": "3.12.4"
}
},
"nbformat": 4,
Expand Down
28 changes: 15 additions & 13 deletions worked-solutions/01_penguin_classification_solutions.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -23,11 +23,11 @@
"### Task 1: look at the data\n",
"In the following code block, we import the ``load_penguins`` function from the ``palmerpenguins`` package.\n",
"\n",
"- Call this function, which returns a single object, and assign it to the variable ``data``.\n",
" - Print ``data`` and recognise that ``load_penguins`` has returned a ``pandas.DataFrame``.\n",
"- Consider which features it might make sense to use in order to classify the species of the penguins.\n",
" - You can print the column titles using ``pd.DataFrame.keys()``\n",
" - You can also obtain useful information using ``pd.DataFrame.Series.describe()``"
"- Call this function, which returns a single object in the form of a ``pandas.DataFrame``, and assign it to the variable ``data``.\n",
" - Print ``data`` and recognise that ``load_penguins`` has returned the dataframe.\n",
"- Analyse which features it might make sense to use in order to classify the species of the penguins.\n",
" - You can print the column names using ``pd.DataFrame.keys()``\n",
" - You can also obtain useful statistical information on the dataset using ``pd.DataFrame.Series.describe()``"
]
},
{
Expand Down Expand Up @@ -108,23 +108,25 @@
"source": [
"### Task 2: creating a ``torch.utils.data.Dataset``\n",
"\n",
"All PyTorch dataset objects are subclasses of the ``torch.utils.data.Dataset`` class. To make a custom dataset, create a class which inherits from the ``Dataset`` class, implement some methods (the Python magic (or dunder) methods ``__len__`` and ``__getitem__``) and supply some data.\n",
"To be able to use Pytorch functionalities, we need to make the dataset compatible with Pytorch. We do it using PyTorch's Dataset class called ``torch.utils.data.Dataset``. \n",
Copy link
Member

@ma595 ma595 Jul 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyTorch

Copy link
Member

@ma595 ma595 Jul 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think we should keep the bit on torch.utils.data.Dataset. I really like the new comment however. Perhaps this should be the first sentence:

"To be able to use Pytorch functionalities, we need to make the dataset compatible with Pytorch. We do it using PyTorch's Dataset class called ``torch.utils.data.Dataset``. \n",

Followed by:
"All PyTorch dataset objects are subclasses of the torch.utils.data.Dataset class. To make a custom dataset, create a class which inherits from the Dataset class, implement some methods (the Python magic (or dunder) methods __len__ and __getitem__) and supply some data.\n",

Sorry, I realise this is fine. It's just been restructured.

"\n",
"Spoiler alert: we've done this for you already in ``src/ml_workshop/_penguins.py``.\n",
"To make a custom dataset, create a new class which inherits from the ``Dataset`` class, implement some methods (the Python magic (or dunder) like ``__len__`` and ``__getitem__``) and supply data.\n",
"\n",
"- Open the file ``src/ml_workshop/_penguins.py``.\n",
"Spoiler alert: we've done this for you already in ``worked-solutions/01_penguin_classification_solutions.ipynb``.\n",
"\n",
"- Open the above mentioned file.\n",
"- Let's examine, and discuss, each of the methods together.\n",
" - ``__len__``\n",
" - What does the ``__len__`` method do?\n",
" - The ``__len__`` method is a so-called \"magic method\", which tells python to do if the ``len`` function is called on the object containing it.\n",
" - The ``__len__`` method is a so-called \"magic method\" in python, that defines what happens when the ``len`` function is called on an object.\n",
" - ``__getitem__``\n",
" - What does the ``__getitem__`` method do?\n",
" - The ``__getitem__`` method is another magic method which tells python what to do if we try and index the object containing it (i.e. ``my_object[idx]``).\n",
"- Review and discuss the class arguments.\n",
" - ``input_keys``— A sequence of strings telling the data set which objects to return as inputs to the model.\n",
" - ``target_keys``— Same as ``input_keys`` but specifying the targets.\n",
" - ``input_keys``— A sequence of strings telling the data set which objects to return as inputs to the model. These are basically the input column names.\n",
" - ``target_keys``— Same as ``input_keys`` but specifying the targets columns.\n",
" - ``train``— A boolean variable determining if the model returns the training or validation split (``True`` for training).\n",
" - ``x_tfms``— A ``Compose`` object with functions which will convert the raw input to a tensor. This argument is _optional_.\n",
" - ``x_tfms``— A ``Compose`` object with functions which will convert the raw input to a tensor. This argument is _optional_. Remember Pytorch deals with tensors only.\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about: Recall that PyTorch deals with torch.Tensorss only

" - ``y_tfms``— A ``Compose`` object with functions which will convert the raw target to a tensor. This argument is _optional_."
]
},
Expand Down Expand Up @@ -900,7 +902,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.4"
"version": "3.12.4"
}
},
"nbformat": 4,
Expand Down