Cambridge-ICCS · surbhigoel77 · Jul 1, 2024 · Jul 1, 2024 · Jul 1, 2024 · Jul 1, 2024
diff --git a/exercises/01_penguin_classification.ipynb b/exercises/01_penguin_classification.ipynb
@@ -23,20 +23,23 @@
     "### Task 1: look at the data\n",
     "In the following code block, we import the ``load_penguins`` function from the ``palmerpenguins`` package.\n",
     "\n",
-    "- Call this function, which returns a single object, and assign it to the variable ``data``.\n",
-    "  - Print ``data`` and recognise that ``load_penguins`` has returned a ``pandas.DataFrame``.\n",
-    "- Consider which features it might make sense to use in order to classify the species of the penguins.\n",
-    "  - You can print the column titles using ``pd.DataFrame.keys()``\n",
-    "  - You can also obtain useful information using ``pd.DataFrame.Series.describe()``"
+    "- Call this function, which returns a single object in the form of a ``pandas.DataFrame``, and assign it to the variable ``data``.\n",
+    "  - Print ``data`` and recognise that ``load_penguins`` has returned the dataframe.\n",
+    "- Analyse which features it might make sense to use in order to classify the species of the penguins.\n",
+    "  - You can print the column names using ``pd.DataFrame.keys()``\n",
+    "  - You can also obtain useful statical information on the dataset using ``pd.DataFrame.Series.describe()``"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 2,
    "metadata": {},
    "outputs": [],
    "source": [
-    "from palmerpenguins import load_penguins"
+    "from palmerpenguins import load_penguins\n",
+    "\n",
+    "# Load the penguin data\n",
+    "penguins = load_penguins()\n"
    ]
   },
   {
@@ -402,7 +405,7 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
+   "display_name": ".venv",
    "language": "python",
    "name": "python3"
   },
@@ -416,7 +419,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.11.4"
+   "version": "3.12.4"
   }
  },
  "nbformat": 4,

diff --git a/worked-solutions/01_penguin_classification_solutions.ipynb b/worked-solutions/01_penguin_classification_solutions.ipynb
@@ -23,11 +23,11 @@
     "### Task 1: look at the data\n",
     "In the following code block, we import the ``load_penguins`` function from the ``palmerpenguins`` package.\n",
     "\n",
-    "- Call this function, which returns a single object, and assign it to the variable ``data``.\n",
-    "  - Print ``data`` and recognise that ``load_penguins`` has returned a ``pandas.DataFrame``.\n",
-    "- Consider which features it might make sense to use in order to classify the species of the penguins.\n",
-    "  - You can print the column titles using ``pd.DataFrame.keys()``\n",
-    "  - You can also obtain useful information using ``pd.DataFrame.Series.describe()``"
+    "- Call this function, which returns a single object in the form of a ``pandas.DataFrame``, and assign it to the variable ``data``.\n",
+    "  - Print ``data`` and recognise that ``load_penguins`` has returned the dataframe.\n",
+    "- Analyse which features it might make sense to use in order to classify the species of the penguins.\n",
+    "  - You can print the column names using ``pd.DataFrame.keys()``\n",
+    "  - You can also obtain useful statistical information on the dataset using ``pd.DataFrame.Series.describe()``"
    ]
   },
   {
@@ -108,23 +108,25 @@
    "source": [
     "### Task 2: creating a ``torch.utils.data.Dataset``\n",
     "\n",
-    "All PyTorch dataset objects are subclasses of the ``torch.utils.data.Dataset`` class. To make a custom dataset, create a class which inherits from the ``Dataset`` class, implement some methods (the Python magic (or dunder) methods ``__len__`` and ``__getitem__``) and supply some data.\n",
+    "To be able to use Pytorch functionalities, we need to make the dataset compatible with Pytorch. We do it using PyTorch's Dataset class called ``torch.utils.data.Dataset``. \n",
     "\n",
-    "Spoiler alert: we've done this for you already in ``src/ml_workshop/_penguins.py``.\n",
+    "To make a custom dataset, create a new class which inherits from the ``Dataset`` class, implement some methods (the Python magic (or dunder) like ``__len__`` and ``__getitem__``) and supply data.\n",
     "\n",
-    "- Open the file ``src/ml_workshop/_penguins.py``.\n",
+    "Spoiler alert: we've done this for you already in ``worked-solutions/01_penguin_classification_solutions.ipynb``.\n",
+    "\n",
+    "- Open the above mentioned file.\n",
     "- Let's examine, and discuss, each of the methods together.\n",
     "  - ``__len__``\n",
     "    - What does the ``__len__`` method do?\n",
-    "      - The ``__len__`` method is a so-called \"magic method\", which tells python to do if the ``len`` function is called on the object containing it.\n",
+    "      - The ``__len__`` method is a so-called \"magic method\" in python, that defines what happens when the ``len`` function is called on an object.\n",
     "  - ``__getitem__``\n",
     "    - What does the ``__getitem__`` method do?\n",
     "      - The ``__getitem__`` method is another magic method which tells python what to do if we try and index the object containing it (i.e. ``my_object[idx]``).\n",
     "- Review and discuss the class arguments.\n",
-    "  - ``input_keys``— A sequence of strings telling the data set which objects to return as inputs to the model.\n",
-    "  - ``target_keys``— Same as ``input_keys`` but specifying the targets.\n",
+    "  - ``input_keys``— A sequence of strings telling the data set which objects to return as inputs to the model. These are basically the input column names.\n",
+    "  - ``target_keys``— Same as ``input_keys`` but specifying the targets columns.\n",
     "  - ``train``— A boolean variable determining if the model returns the training or validation split (``True`` for training).\n",
-    "  - ``x_tfms``— A ``Compose`` object with functions which will convert the raw input to a tensor. This argument is _optional_.\n",
+    "  - ``x_tfms``— A ``Compose`` object with functions which will convert the raw input to a tensor. This argument is _optional_. Remember Pytorch deals with tensors only.\n",
     "  - ``y_tfms``— A ``Compose`` object with functions which will convert the raw target to a tensor. This argument is _optional_."
    ]
   },
@@ -900,7 +902,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.11.4"
+   "version": "3.12.4"
   }
  },
  "nbformat": 4,