astrojuanlu
diff --git a/‎img/red-green-refactor.png
34.8 KB b/‎img/red-green-refactor.png
34.8 KB
diff --git a/‎sessions/02_layout-unit-tests.ipynb
Lines changed: 171 additions & 15 deletions b/‎sessions/02_layout-unit-tests.ipynb
Lines changed: 171 additions & 15 deletions
@@ -163,36 +163,192 @@
    "source": [
     "### Exercise\n",
     "\n",
-    "1. Create a directory called `test-package`\n",
+    "1. Create a directory called `nlp-utils`\n",
     "2. `git init` inside it\n",
-    "3. Create a basic `src` layout in it, with `name = test-package` and the source in `src/test_package`\n",
+    "3. Create a basic `src` layout in it, with `name = nlp-utils` and the source in `src/nlp_utils`\n",
     "4. Create a `src/test_package/__init__.py` with a `print(\"Hello, world!\")`\n",
-    "5. Install it in editable mode using `pip` and test that `>>> import test_package` prints `Hello, world!`\n",
+    "5. Install it in editable mode using `pip` and test that `>>> import nlp_utils` prints `Hello, world!`\n",
     "6. Include a `README.txt` and an appropriate `.gitignore` from http://gitignore.io/\n",
     "7. Commit the changes\n",
     "8. Create a new GitHub project and push the repository there\n",
     "\n",
     "🎉"
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Testing\n",
+    "\n",
+    "Testing is **essential**. Many developers get along without testing their software, but as common wisdom says:\n",
+    "\n",
+    "<blockquote class=\"twitter-tweet\"><p lang=\"en\" dir=\"ltr\">If you use software that lacks automated tests, you are the tests.</p>&mdash; Jenny Bryan (@JennyBryan) <a href=\"https://twitter.com/JennyBryan/status/1043307291909316609?ref_src=twsrc%5Etfw\">September 22, 2018</a></blockquote> <script async src=\"https://platform.twitter.com/widgets.js\" charset=\"utf-8\"></script>\n",
+    "\n",
+    "Computers excel at doing repetitive tasks: they basically never make mistakes (the mistake might be in what we told the computer to do). Humans, on the other hand, fail more often, especially under pressure, or on Friday afternoons and Monday mornings. Therefore, instead of letting the humans be the tests, we will use the computer to **frequently verify that our software works as specified**.\n",
+    "\n",
+    "### References\n",
+    "\n",
+    "* pytest documentation https://docs.pytest.org\n",
+    "\n",
+    "### Further reading\n",
+    "\n",
+    "* Extreme Programming https://www.wikiwand.com/en/Extreme_programming\n",
+    "* Obey the Testing Goat! http://www.obeythetestinggoat.com/pages/book.html#toc\n",
+    "* (Shameless self-plug) Testing and validation approaches for scientific software https://nbviewer.jupyter.org/format/slides/github/poliastro/oscw2018-talk/blob/master/Talk.ipynb"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Test-Driven Development\n",
+    "\n",
+    "> Make it work. Make it right. Make it fast.\n",
+    "\n",
+    "Test-Driven Development shifts the focus of software development to writing tests. The \"practice of test-first development, planning and writing tests before each micro-increment\" is not new: it was in use at NASA in the early 1960s ([source](https://www.wikiwand.com/en/Extreme_programming)). In the 1990s, Extreme Programming took this concept to the extreme by the use of **small, automated** tests.\n",
+    "\n",
+    "The \"test-driven development mantra\" is <span style=\"color: red\">**Red**</span> - <span style=\"color: green\">**Green**</span> - **Refactor**:\n",
+    "\n",
+    "![The mantra](../img/red-green-refactor.png)\n",
+    "\n",
+    "1. Write a test. <span style=\"color: red\">**Watch it fail**</span>.\n",
+    "2. Write just enough code to <span style=\"color: green\">**pass the test**</span>.\n",
+    "3. Improve the code without breaking the test.\n",
+    "4. Repeat."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Testing in Python\n",
+    "\n",
+    "Summary: **use pytest**. Everybody does. It rocks.\n",
+    "\n",
+    "[pytest](https://docs.pytest.org/) is a testing framework for Python that makes writing tests extremely easy. It is much more powerful than the standard library equivalent, `unittest`. To use it, you need to install it first:\n",
+    "\n",
+    "```\n",
+    "$ pip install pytest\n",
+    "```\n",
+    "\n",
+    "The simplest test is **a function with an `assert`**. The `assert` statement just fails if the contents are not `True`, and else it does nothing. *It should only be used for testing*."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "assert True  # Does nothing"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "ename": "AssertionError",
+     "evalue": "",
+     "output_type": "error",
+     "traceback": [
+      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+      "\u001b[0;31mAssertionError\u001b[0m                            Traceback (most recent call last)",
+      "\u001b[0;32m<ipython-input-2-40f67ddecc26>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0;32massert\u001b[0m \u001b[0;32mFalse\u001b[0m  \u001b[0;31m# Fails!\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
+      "\u001b[0;31mAssertionError\u001b[0m: "
+     ]
+    }
+   ],
+   "source": [
+    "assert False  # Fails!"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "ename": "AssertionError",
+     "evalue": "Math is wrong",
+     "output_type": "error",
+     "traceback": [
+      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+      "\u001b[0;31mAssertionError\u001b[0m                            Traceback (most recent call last)",
+      "\u001b[0;32m<ipython-input-3-c2bde6a3219c>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0;32massert\u001b[0m \u001b[0;36m2\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0;36m2\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;36m5\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\"Math is wrong\"\u001b[0m  \u001b[0;31m# Fails with a message\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
+      "\u001b[0;31mAssertionError\u001b[0m: Math is wrong"
+     ]
+    }
+   ],
+   "source": [
+    "assert 2 + 2 == 5, \"Math is wrong\"  # Fails with a message"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Example\n",
+    "\n",
+    "> Write a function that **tokenizes a sentence** (i.e. splits it into a list of words)\n",
+    "\n",
+    "First, we write a (failing) test:\n",
+    "\n",
+    "```python\n",
+    "# tests/test_tokenize.py\n",
+    "from nlp_utils import tokenize  # This will fail right away!\n",
+    "\n",
+    "def test_tokenize_returns_expected_list():\n",
+    "    sentence = \"This is a sentence\"\n",
+    "    expected_tokens = [\"This\", \"is\", \"a\", \"sentence\"]\n",
+    "\n",
+    "    tokens = tokenize(sentence)\n",
+    "\n",
+    "    assert tokens == expected_tokens\n",
+    "```\n",
+    "\n",
+    "and we run it from the command line:\n",
+    "\n",
+    "```\n",
+    "$ pytest\n",
+    "...\n",
+    "```\n",
+    "\n",
+    "Then we fix the test in the simplest way:\n",
+    "\n",
+    "```python\n",
+    "# src/nlp_utils/__init__.py\n",
+    "def tokenize(sentence):\n",
+    "    return sentence.split()\n",
+    "```\n",
+    "\n",
+    "And we watch it pass!\n",
+    "\n",
+    "```\n",
+    "$ pytest\n",
+    "...\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Exercise\n",
+    "\n",
+    "1. Add a new test that checks that `tokenize(sentence, lower=True)` returns a list of *lowercase* tokens.\n",
+    "2. Fix the test *in a way the first one doesn't break*.\n",
+    "3. *Extra*: Use `@pytest.mark.parametrize` to pass two different sentences to the new test https://docs.pytest.org/en/latest/example/parametrize.html"
+   ]
   }
  ],
  "metadata": {
   "kernelspec": {
    "display_name": "Python 3",
    "language": "python",
    "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.7.1"
   }
  },
  "nbformat": 4,