Skip to content

Commit 8c5892d

Browse files
committed
Merge branch 'main' into exercise-timer
2 parents 0db54ca + 0e9d5bc commit 8c5892d

9 files changed

+470
-84
lines changed

.github/workflows/env-checks.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,10 @@ on:
2121
schedule:
2222
- cron: "44 22 11 * *"
2323

24+
concurrency:
25+
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
26+
cancel-in-progress: true
27+
2428
# A workflow run is made up of one or more jobs that can run sequentially or in parallel
2529
jobs:
2630
# This workflow contains a single job called "build"

.github/workflows/stale.yml

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
# This workflow warns and then closes issues and PRs that have had no activity for a specified amount of time.
2+
#
3+
# You can adjust the behavior by modifying this file.
4+
# For more information, see:
5+
# https://github.com/actions/stale
6+
name: Mark stale issues and pull requests
7+
8+
on:
9+
schedule:
10+
- cron: '27 20 * * *'
11+
12+
jobs:
13+
stale:
14+
runs-on: ubuntu-latest
15+
steps:
16+
- uses: actions/stale@v4
17+
with:
18+
days-before-stale: 30
19+
days-before-close: 7
20+
stale-issue-message: 'This issue has been marked as stale due to lack of recent activity. It will be closed if no further activity occurs.'
21+
stale-pr-message: ''
22+
stale-issue-label: 'stale'
23+
stale-pr-label: 'stale'

notebooks/1-getting_started_with_pandas.ipynb

Lines changed: 68 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -2057,7 +2057,7 @@
20572057
},
20582058
{
20592059
"cell_type": "markdown",
2060-
"id": "1dd8c102-fe33-4376-98f8-0b8c9c5b8384",
2060+
"id": "ef8f5c35-57e4-4347-845e-39fa4de28075",
20612061
"metadata": {
20622062
"slideshow": {
20632063
"slide_type": "subslide"
@@ -2071,13 +2071,13 @@
20712071
{
20722072
"cell_type": "code",
20732073
"execution_count": 25,
2074-
"id": "3d666cd2-eca0-4634-86a5-aebbcda99522",
2074+
"id": "b998653e-02dd-4540-ba66-dbc4b458b558",
20752075
"metadata": {},
20762076
"outputs": [
20772077
{
20782078
"data": {
20792079
"text/plain": [
2080-
"32.6"
2080+
"13278.078548601512"
20812081
]
20822082
},
20832083
"execution_count": 25,
@@ -2086,33 +2086,47 @@
20862086
}
20872087
],
20882088
"source": [
2089-
"meteorites['mass (g)'].median()"
2089+
"meteorites['mass (g)'].mean()"
20902090
]
20912091
},
20922092
{
20932093
"cell_type": "markdown",
2094-
"id": "f322c0f3-a057-4193-9f7f-78c9828d6197",
2094+
"id": "a398ecbe-10cc-4498-a7f1-91ea0bc736d2",
20952095
"metadata": {
20962096
"slideshow": {
20972097
"slide_type": "fragment"
20982098
},
20992099
"tags": []
21002100
},
21012101
"source": [
2102-
"We can take this a step further and look at quantiles:"
2102+
"**Important**: The mean isn't always the best measure of central tendency. If there are outliers in the distribution, the mean will be skewed. Here, the mean is being pulled higher by some very heavy meteorites – the distribution is [right-skewed](https://www.analyticsvidhya.com/blog/2020/07/what-is-skewness-statistics/)."
2103+
]
2104+
},
2105+
{
2106+
"cell_type": "markdown",
2107+
"id": "7b0162c6-f48f-4687-9902-72325ebecc0d",
2108+
"metadata": {
2109+
"slideshow": {
2110+
"slide_type": "subslide"
2111+
},
2112+
"tags": []
2113+
},
2114+
"source": [
2115+
"Taking a look at some quantiles at the extremes of the distribution shows that the mean is between the 95th and 99th percentile of the distribution, so it isn't a good measure of central tendency here:"
21032116
]
21042117
},
21052118
{
21062119
"cell_type": "code",
21072120
"execution_count": 26,
2108-
"id": "5d97fd11-12eb-4970-b042-6cbbd35a3a23",
2121+
"id": "b7379492-da17-4358-b357-2ae6e1a26e67",
21092122
"metadata": {},
21102123
"outputs": [
21112124
{
21122125
"data": {
21132126
"text/plain": [
21142127
"0.01 0.44\n",
21152128
"0.05 1.10\n",
2129+
"0.50 32.60\n",
21162130
"0.95 4000.00\n",
21172131
"0.99 50600.00\n",
21182132
"Name: mass (g), dtype: float64"
@@ -2124,7 +2138,41 @@
21242138
}
21252139
],
21262140
"source": [
2127-
"meteorites['mass (g)'].quantile([0.01, 0.05, 0.95, 0.99])"
2141+
"meteorites['mass (g)'].quantile([0.01, 0.05, 0.5, 0.95, 0.99])"
2142+
]
2143+
},
2144+
{
2145+
"cell_type": "markdown",
2146+
"id": "2ca1c739-cf2b-4000-bedb-b66a3d11f071",
2147+
"metadata": {
2148+
"slideshow": {
2149+
"slide_type": "fragment"
2150+
},
2151+
"tags": []
2152+
},
2153+
"source": [
2154+
"A better measure in this case is the median (50th percentile), since it is robust to outliers:"
2155+
]
2156+
},
2157+
{
2158+
"cell_type": "code",
2159+
"execution_count": 27,
2160+
"id": "bc2e62f3-899d-4a50-a2f4-8b2e73e1bc2f",
2161+
"metadata": {},
2162+
"outputs": [
2163+
{
2164+
"data": {
2165+
"text/plain": [
2166+
"32.6"
2167+
]
2168+
},
2169+
"execution_count": 27,
2170+
"metadata": {},
2171+
"output_type": "execute_result"
2172+
}
2173+
],
2174+
"source": [
2175+
"meteorites['mass (g)'].median()"
21282176
]
21292177
},
21302178
{
@@ -2142,7 +2190,7 @@
21422190
},
21432191
{
21442192
"cell_type": "code",
2145-
"execution_count": 27,
2193+
"execution_count": 28,
21462194
"id": "585af605-e601-49b6-bd1f-4838ab993302",
21472195
"metadata": {},
21482196
"outputs": [
@@ -2152,7 +2200,7 @@
21522200
"60000000.0"
21532201
]
21542202
},
2155-
"execution_count": 27,
2203+
"execution_count": 28,
21562204
"metadata": {},
21572205
"output_type": "execute_result"
21582206
}
@@ -2176,7 +2224,7 @@
21762224
},
21772225
{
21782226
"cell_type": "code",
2179-
"execution_count": 28,
2227+
"execution_count": 29,
21802228
"id": "29720ccc-3855-42f7-a0d0-e41a83cf1bef",
21812229
"metadata": {},
21822230
"outputs": [
@@ -2196,7 +2244,7 @@
21962244
"Name: 16392, dtype: object"
21972245
]
21982246
},
2199-
"execution_count": 28,
2247+
"execution_count": 29,
22002248
"metadata": {},
22012249
"output_type": "execute_result"
22022250
}
@@ -2220,7 +2268,7 @@
22202268
},
22212269
{
22222270
"cell_type": "code",
2223-
"execution_count": 29,
2271+
"execution_count": 30,
22242272
"id": "79c2a1db-0eeb-4173-964a-a38741c059ba",
22252273
"metadata": {},
22262274
"outputs": [
@@ -2230,7 +2278,7 @@
22302278
"466"
22312279
]
22322280
},
2233-
"execution_count": 29,
2281+
"execution_count": 30,
22342282
"metadata": {},
22352283
"output_type": "execute_result"
22362284
}
@@ -2254,7 +2302,7 @@
22542302
},
22552303
{
22562304
"cell_type": "code",
2257-
"execution_count": 30,
2305+
"execution_count": 31,
22582306
"id": "3ac57de5-7734-478a-9772-feb82890d5ef",
22592307
"metadata": {},
22602308
"outputs": [
@@ -2266,7 +2314,7 @@
22662314
" dtype=object)"
22672315
]
22682316
},
2269-
"execution_count": 30,
2317+
"execution_count": 31,
22702318
"metadata": {},
22712319
"output_type": "execute_result"
22722320
}
@@ -2299,7 +2347,7 @@
22992347
},
23002348
{
23012349
"cell_type": "code",
2302-
"execution_count": 31,
2350+
"execution_count": 32,
23032351
"id": "f0297d45-1d86-411f-ad8e-74cfaa3b2389",
23042352
"metadata": {},
23052353
"outputs": [
@@ -2512,7 +2560,7 @@
25122560
"max NaN 81.166670 354.473330 NaN "
25132561
]
25142562
},
2515-
"execution_count": 31,
2563+
"execution_count": 32,
25162564
"metadata": {},
25172565
"output_type": "execute_result"
25182566
}
@@ -2557,7 +2605,7 @@
25572605
},
25582606
{
25592607
"cell_type": "code",
2560-
"execution_count": 32,
2608+
"execution_count": 33,
25612609
"id": "876cafcb-00ab-4f5a-8b3c-bfead4f0b14c",
25622610
"metadata": {},
25632611
"outputs": [],
@@ -2578,7 +2626,7 @@
25782626
},
25792627
{
25802628
"cell_type": "code",
2581-
"execution_count": 33,
2629+
"execution_count": 34,
25822630
"id": "6402bb24-3da9-48e5-bde1-0b8a9576f00d",
25832631
"metadata": {},
25842632
"outputs": [],

notebooks/3-data_visualization.ipynb

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,26 @@
3232
"The human brain excels at finding patterns in visual representations of the data; so in this section, we will learn how to visualize data using pandas along with the Matplotlib and Seaborn libraries for additional features. We will create a variety of visualizations that will help us better understand our data."
3333
]
3434
},
35+
{
36+
"cell_type": "markdown",
37+
"id": "267d9762-d012-43d1-82b0-02b37e110de8",
38+
"metadata": {
39+
"slideshow": {
40+
"slide_type": "slide"
41+
},
42+
"tags": []
43+
},
44+
"source": [
45+
"## Why is data visualization necessary?\n",
46+
"\n",
47+
"So far, we have focused a lot on summarizing the data using statistics. However, summary statistics are not enough to understand the distribution – there are many possible distributions for a given set of summary statistics. Data visualization is necessary to truly understand the distribution:\n",
48+
"\n",
49+
"<div style=\"text-align: center; margin-top: -10px;\">\n",
50+
"<img width=\"50%\" src=\"https://raw.githubusercontent.com/stefmolin/data-morph/main/docs/_static/panda-to-star-eased.gif\" alt=\"Data Morph: panda to star\" style=\"min-width: 300px; margin-bottom: -10px;\"/>\n",
51+
"<div style=\"margin: auto 26%;\"><small><em>A set of points forming a panda can also form a star without any significant changes to the summary statistics displayed above. (source: <a href=\"https://github.com/stefmolin/data-morph\">Data Morph</a>)</em></small></div>\n",
52+
"</div>"
53+
]
54+
},
3555
{
3656
"cell_type": "markdown",
3757
"id": "e58aeca3-c71e-4b42-9ece-4eaa30ea0382",

0 commit comments

Comments
 (0)