Skip to content

Commit df4402e

Browse files
committed
Course restructure
1 parent 3ee32f8 commit df4402e

12 files changed

+326
-155
lines changed

Lesson 1-1 - Structuring and wrangling messy data using a spreadsheet editor.ipynb renamed to Module 1 - Lesson 1 - Structuring and wrangling messy data using a spreadsheet editor.ipynb

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
"cell_type": "markdown",
55
"metadata": {},
66
"source": [
7-
"# 1. Structuring and wrangling messy data using a spreadsheet editor\n",
7+
"# _Module 1 lesson 1_: Structuring & wrangling messy data in a spreadsheet\n",
88
"\n",
99
"<div class=\"alert alert-block alert-warning\">\n",
1010
" <b>Learning outcomes:</b>\n",
@@ -563,9 +563,11 @@
563563
"source": [
564564
"## 1.5 Lesson tutorial\n",
565565
"\n",
566-
"Pick a spreadsheet from [training data](data/lesson-spreadsheet/) and restructure it according to the techniques and requirements presented in this lesson.\n",
567-
"\n",
568-
"The data in this folder are also sourced from the World Bank, but from a long time ago, before the World Bank knew what open or machine-readable data was. It contains some of the worst examples of data-mangling you will ever see.\n",
566+
"<div class=\"alert alert-block alert-success\">\n",
567+
" <p><b>Tutorial:</b></p>\n",
568+
" <p>Pick a spreadsheet from <a href=\"data/lesson-spreadsheet/\">training data</a> and restructure it according to the techniques and requirements presented in this lesson.</p>\n",
569+
" <p>The data in this folder are also sourced from the World Bank, but from a long time ago, before the World Bank knew what open or machine-readable data was. It contains some of the worst examples of data-mangling you will ever see.</p>\n",
570+
"</div>\n",
569571
"\n",
570572
"Please complete this tutorial before beginning the next lesson."
571573
]
@@ -587,10 +589,11 @@
587589
"name": "python",
588590
"nbconvert_exporter": "python",
589591
"pygments_lexer": "ipython3",
590-
"version": "3.6.3"
592+
"version": "3.8.5"
591593
},
592594
"latex_envs": {
593595
"LaTeX_envs_menu_present": true,
596+
"autoclose": false,
594597
"autocomplete": true,
595598
"bibliofile": "biblio.bib",
596599
"cite_by": "apalike",

Lesson 2-1 - Validating restructured data against a schema using a spreadsheet.ipynb renamed to Module 1 - Lesson 2 - Validating restructured data against a schema using a spreadsheet.ipynb

Lines changed: 36 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
"cell_type": "markdown",
55
"metadata": {},
66
"source": [
7-
"# 1. Validating restructured data against a schema using a spreadsheet\n",
7+
"# _Module 1 lesson 2_: Validating restructured data against a schema using a spreadsheet\n",
88
"\n",
99
"<div class=\"alert alert-block alert-warning\">\n",
1010
" <b>Learning outcomes:</b>\n",
@@ -23,7 +23,7 @@
2323
"cell_type": "markdown",
2424
"metadata": {},
2525
"source": [
26-
"## 1.1 Creating a JSON schema\n",
26+
"## 2.1 Creating a JSON schema\n",
2727
"\n",
2828
"When you produced your machine-readable file in Lesson 1, you came up with your own approach to how to structure the header row and the data. You named the columns yourself, and decided on how many, and what data should be in them. You did so by reviewing the source data.\n",
2929
"\n",
@@ -45,7 +45,7 @@
4545
"\n",
4646
"In simple terms, you need to specify the columns which an input CSV or Excel-file will be restructured into. The new columns are defined by the fields in your schema. These target fields are likely to be those in your database, or in your analytical software. Until your input data conform to this structure, your data will not validate.\n",
4747
"\n",
48-
"### 1.1.1 Minimum valid requirements\n",
48+
"### 2.1.1 Minimum valid requirements\n",
4949
"\n",
5050
"A minimum valid schema requires a `name` to identify the schema, and a single, minimally-valid `field` containing a `name` and `type`:\n",
5151
"\n",
@@ -68,7 +68,7 @@
6868
"\n",
6969
"The `fields` value is a list, or - in JSON terminology - an `array` of dictionary `objects`. Each field, unsurprisingly, has a `name`, `title` and `description`, of which only the `name` is required. \n",
7070
"\n",
71-
"### 1.1.2 Types\n",
71+
"### 2.1.2 Types\n",
7272
"\n",
7373
"Fields also have a `type`. This describes the data expected and limits the actions which can be performed during the wrangling process:\n",
7474
"\n",
@@ -84,7 +84,7 @@
8484
"\n",
8585
"There are more [types and formats](https://specs.frictionlessdata.io/table-schema/#types-and-formats) like `geojson`, `geopoints` and variations on dates.\n",
8686
"\n",
87-
"### 1.1.3 Constraints\n",
87+
"### 2.1.3 Constraints\n",
8888
"\n",
8989
"In addition, these data can be `constrained`:\n",
9090
"\n",
@@ -111,21 +111,19 @@
111111
"\n",
112112
"Again, there are other [constraints](https://specs.frictionlessdata.io/table-schema/#constraints), such as `pattern`, `maxLength`, `minLength` you can use as well.\n",
113113
"\n",
114-
"### 1.1.4 Other properties\n",
114+
"### 2.1.4 Other properties\n",
115115
"\n",
116116
"There are also special properties you can add to your schema that are not part of the `fields` definitions:\n",
117117
"\n",
118118
"* `missingValues`: defines which terms in your data should be treated as missing values, e.g. `-`, `NaN`, `..`, etc. This must be presented as a list, with terms defined as strings, e.g. `[\"NaN\", \"..\"]`\n",
119119
"\n",
120-
"### 1.1.5 Example schema\n",
120+
"### 2.1.5 Example schema\n",
121121
"\n",
122122
"As an example, let's imagine we want our destination data to conform to the following structure:\n",
123123
"\n",
124-
" ========= ============ ============= ======== ================ ===================== ============= ========================\n",
125-
" la_code ba_ref occupant_name postcode occupation_state occupation_state_date prop_ba_rates occupation_state_reliefs\n",
126-
" ========= ============ ============= ======== ================ ===================== ============= ========================\n",
127-
" E06000044 177500080710 A company PO5 2SE True 2019-04-01 98530 [small_business, retail]\n",
128-
" ========= ============ ============= ======== ================ ===================== ============= ========================\n",
124+
"| la_code | ba_ref | occupant_name | postcode | occupation_state | occupation_state_date | prop_ba_rates | occupation_state_reliefs |\n",
125+
"|---------|--------|---------------|----------|------------------|-----------------------|---------------|-------------------------|\n",
126+
"| E06000044 | 177500080710 | A company | PO5 2SE | True | 2019-04-01 | 98530 | [small_business, retail] |\n",
129127
"\n",
130128
"The complete schema for this example is then:"
131129
]
@@ -243,15 +241,15 @@
243241
"cell_type": "markdown",
244242
"metadata": {},
245243
"source": [
246-
"## 1.2 Apply data validation to cells in a spreadsheet\n",
244+
"## 2.2 Apply data validation to cells in a spreadsheet\n",
247245
"\n",
248246
"Your `types` - at this stage - are only a guide. You will have no feedback, or error messages like you get when running Python code, if any of the data types in your field columns are wrong. There are a few ways to get that feedback so you can correct things, but we'll start with data validation in spreadsheet cells.\n",
249247
"\n",
250248
"The following is adapted from a [Microsoft Office tutorial](https://support.office.com/en-gb/article/apply-data-validation-to-cells-29fecbcc-d1b9-42c1-9d76-eff3ce5f7249). This approach will work in OpenOffice as well as Google Sheets, although the specific steps are different.\n",
251249
"\n",
252250
"Microsoft has an example file you can [download](http://download.microsoft.com/download/9/6/8/968A9140-2E13-4FDC-B62C-C1D98D2B0FE6/Data%20Validation%20Examples.xlsx).\n",
253251
"\n",
254-
"### 1.2.1 Specify validation for data types\n",
252+
"### 2.2.1 Specify validation for data types\n",
255253
"\n",
256254
"The process is straightforward:\n",
257255
"\n",
@@ -289,7 +287,7 @@
289287
"\n",
290288
"Now - only for new data - if a user tries to enter a value that is not valid, a pop-up appears with the message, \"This value doesn’t match the data validation restrictions for this cell.\" We'll run validation on your existing data shortly, but first a detour into `lists`.\n",
291289
"\n",
292-
"### 1.2.2 Lists are a special type\n",
290+
"### 2.2.2 Lists are a special type\n",
293291
"\n",
294292
"Before you can validate a `list` type, you need to specify valid terms. In Excel, this requires an [extra set of steps](https://support.office.com/en-us/article/create-a-drop-down-list-7693307a-59ef-400a-b769-c5402dce407b).\n",
295293
"\n",
@@ -308,7 +306,7 @@
308306
" - Convert your list to a table with __Ctrl+T__, then from the __Table Design__ tab give your table a name, permitting you to reference the table name and column (e.g. `=CityTable[City]`)\n",
309307
" - From the __Formulas__ tab select __Name Manager__, create a __New__ item with an appropriate name (e.g. `CityList`), and reference the cells (e.g. `=Sheet1!A4:A10`), which then lets you reference your list anywhere (e.g. `=CityList`)\n",
310308
"\n",
311-
"### 1.2.3 Validate and get error messages for your existing data\n",
309+
"### 2.2.3 Validate and get error messages for your existing data\n",
312310
"\n",
313311
"After you've specified validation rules on your existing data you might be disappoined. Excel does not automatically notify you whether these cells contain invalid data. Here's a quick way to [highlight existing invalid cells](https://support.office.com/en-us/article/more-on-data-validation-f38dee73-9900-4ca6-9301-8a5f6e1f0c4c) by circling the values:\n",
314312
"\n",
@@ -350,7 +348,7 @@
350348
"cell_type": "markdown",
351349
"metadata": {},
352350
"source": [
353-
"## 1.3 Saving your validated file as a comma-separated-value\n",
351+
"## 2.3 Saving your validated file as a comma-separated-value\n",
354352
"\n",
355353
"Comma separated value files (`.csv`) are text files in which the comma character `,` separates each field of text. Where a comma appears in the value - whether a `string` or `number` - the value is then surrounded by quotation marks, e.g. `100, 200, \"20,000\"` indicates three values in three separate fields.\n",
356354
"\n",
@@ -374,7 +372,7 @@
374372
"cell_type": "markdown",
375373
"metadata": {},
376374
"source": [
377-
"## 1.4 Validating your data and JSON schema using CSVLint\n",
375+
"## 2.4 Validating your data and JSON schema using CSVLint\n",
378376
"\n",
379377
"In the next lesson, we'll learn how to validate your data using Python directly in a Jupyter Notebook, for now we'll use an online resource provided by the Open Data Institute called [CSVLint](https://csvlint.io/).\n",
380378
"\n",
@@ -418,7 +416,7 @@
418416
"cell_type": "markdown",
419417
"metadata": {},
420418
"source": [
421-
"## 1.5 Lesson tutorial\n",
419+
"## 2.5 Lesson tutorial\n",
422420
"\n",
423421
"<div class=\"alert alert-block alert-success\">\n",
424422
" <p><b>Tutorial:</b></p>\n",
@@ -451,7 +449,25 @@
451449
"name": "python",
452450
"nbconvert_exporter": "python",
453451
"pygments_lexer": "ipython3",
454-
"version": "3.7.7"
452+
"version": "3.8.5"
453+
},
454+
"latex_envs": {
455+
"LaTeX_envs_menu_present": true,
456+
"autoclose": false,
457+
"autocomplete": true,
458+
"bibliofile": "biblio.bib",
459+
"cite_by": "apalike",
460+
"current_citInitial": 1,
461+
"eqLabelWithNumbers": true,
462+
"eqNumInitial": 1,
463+
"hotkeys": {
464+
"equation": "Ctrl-E",
465+
"itemize": "Ctrl-I"
466+
},
467+
"labels_anchors": false,
468+
"latex_user_defs": false,
469+
"report_style_numbering": false,
470+
"user_envs_cfg": false
455471
}
456472
},
457473
"nbformat": 4,

Leçon 1-1 - Structurer et organiser des données désordonnées à l'aide d'un tableur.ipynb renamed to Module 1 - Leçon 1 - Structurer et organiser des données désordonnées à l'aide d'un tableur.ipynb

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
"cell_type": "markdown",
55
"metadata": {},
66
"source": [
7-
"# 1. Structurer et organiser des données désordonnées à l'aide d'un tableur\n",
7+
"# _Module 1 leçon 1_: Structurer et organiser des données désordonnées à l'aide d'un tableur\n",
88
"\n",
99
"<div class=\"alert alert-block alert-warning\">\n",
1010
" <b>A la fin de la formation, vous pourrez:</b>\n",
@@ -548,9 +548,11 @@
548548
"source": [
549549
"## 1.5 Tutoriel de la leçon\n",
550550
"\n",
551-
"Choisissez une feuille de calcul dans [données de formation](data/lesson-spreadsheet/) et restructurez-la selon les techniques et les exigences présentées dans cette leçon.\n",
552-
"\n",
553-
"Les données de ce dossier proviennent également de la Banque mondiale, mais elles datent d'une époque bien antérieure, avant que la Banque mondiale ne sache ce qu'étaient des données ouvertes ou exploitables par une machine. Il contient certains des pires exemples de manipulation de données que vous n'aurez jamais vus.\n",
551+
"<div class=\"alert alert-block alert-success\">\n",
552+
" <p><b>Tutoriel:</b></p>\n",
553+
" <p>Choisissez une feuille de calcul dans <a href=\"data/lesson-spreadsheet/\">données de formation</a> et restructurez-la selon les techniques et les exigences présentées dans cette leçon.</p>\n",
554+
" <p>Les données de ce dossier proviennent également de la Banque mondiale, mais elles datent d'une époque bien antérieure, avant que la Banque mondiale ne sache ce qu'étaient des données ouvertes ou exploitables par une machine. Il contient certains des pires exemples de manipulation de données que vous n'aurez jamais vus.</p>\n",
555+
"</div>\n",
554556
"\n",
555557
"Veuillez compléter ce tutoriel avant de commencer la prochaine leçon."
556558
]
@@ -572,10 +574,11 @@
572574
"name": "python",
573575
"nbconvert_exporter": "python",
574576
"pygments_lexer": "ipython3",
575-
"version": "3.6.3"
577+
"version": "3.8.5"
576578
},
577579
"latex_envs": {
578580
"LaTeX_envs_menu_present": true,
581+
"autoclose": false,
579582
"autocomplete": true,
580583
"bibliofile": "biblio.bib",
581584
"cite_by": "apalike",

0 commit comments

Comments
 (0)