Skip to content

Commit 310a8ef

Browse files
authored
Merge pull request #7 from stefmolin/updates
Updates to section 1 for mean/median and skew. skip-checks: true
2 parents 181300f + 83f20aa commit 310a8ef

File tree

4 files changed

+304
-78
lines changed

4 files changed

+304
-78
lines changed

notebooks/1-getting_started_with_pandas.ipynb

Lines changed: 68 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -2057,7 +2057,7 @@
20572057
},
20582058
{
20592059
"cell_type": "markdown",
2060-
"id": "1dd8c102-fe33-4376-98f8-0b8c9c5b8384",
2060+
"id": "ef8f5c35-57e4-4347-845e-39fa4de28075",
20612061
"metadata": {
20622062
"slideshow": {
20632063
"slide_type": "subslide"
@@ -2071,13 +2071,13 @@
20712071
{
20722072
"cell_type": "code",
20732073
"execution_count": 25,
2074-
"id": "3d666cd2-eca0-4634-86a5-aebbcda99522",
2074+
"id": "b998653e-02dd-4540-ba66-dbc4b458b558",
20752075
"metadata": {},
20762076
"outputs": [
20772077
{
20782078
"data": {
20792079
"text/plain": [
2080-
"32.6"
2080+
"13278.078548601512"
20812081
]
20822082
},
20832083
"execution_count": 25,
@@ -2086,33 +2086,47 @@
20862086
}
20872087
],
20882088
"source": [
2089-
"meteorites['mass (g)'].median()"
2089+
"meteorites['mass (g)'].mean()"
20902090
]
20912091
},
20922092
{
20932093
"cell_type": "markdown",
2094-
"id": "f322c0f3-a057-4193-9f7f-78c9828d6197",
2094+
"id": "a398ecbe-10cc-4498-a7f1-91ea0bc736d2",
20952095
"metadata": {
20962096
"slideshow": {
20972097
"slide_type": "fragment"
20982098
},
20992099
"tags": []
21002100
},
21012101
"source": [
2102-
"We can take this a step further and look at quantiles:"
2102+
"**Important**: The mean isn't always the best measure of central tendency. If there are outliers in the distribution, the mean will be skewed. Here, the mean is being pulled higher by some very heavy meteorites – the distribution is [right-skewed](https://www.analyticsvidhya.com/blog/2020/07/what-is-skewness-statistics/)."
2103+
]
2104+
},
2105+
{
2106+
"cell_type": "markdown",
2107+
"id": "7b0162c6-f48f-4687-9902-72325ebecc0d",
2108+
"metadata": {
2109+
"slideshow": {
2110+
"slide_type": "subslide"
2111+
},
2112+
"tags": []
2113+
},
2114+
"source": [
2115+
"Taking a look at some quantiles at the extremes of the distribution shows that the mean is between the 95th and 99th percentile of the distribution, so it isn't a good measure of central tendency here:"
21032116
]
21042117
},
21052118
{
21062119
"cell_type": "code",
21072120
"execution_count": 26,
2108-
"id": "5d97fd11-12eb-4970-b042-6cbbd35a3a23",
2121+
"id": "b7379492-da17-4358-b357-2ae6e1a26e67",
21092122
"metadata": {},
21102123
"outputs": [
21112124
{
21122125
"data": {
21132126
"text/plain": [
21142127
"0.01 0.44\n",
21152128
"0.05 1.10\n",
2129+
"0.50 32.60\n",
21162130
"0.95 4000.00\n",
21172131
"0.99 50600.00\n",
21182132
"Name: mass (g), dtype: float64"
@@ -2124,7 +2138,41 @@
21242138
}
21252139
],
21262140
"source": [
2127-
"meteorites['mass (g)'].quantile([0.01, 0.05, 0.95, 0.99])"
2141+
"meteorites['mass (g)'].quantile([0.01, 0.05, 0.5, 0.95, 0.99])"
2142+
]
2143+
},
2144+
{
2145+
"cell_type": "markdown",
2146+
"id": "2ca1c739-cf2b-4000-bedb-b66a3d11f071",
2147+
"metadata": {
2148+
"slideshow": {
2149+
"slide_type": "fragment"
2150+
},
2151+
"tags": []
2152+
},
2153+
"source": [
2154+
"A better measure in this case is the median (50th percentile), since it is robust to outliers:"
2155+
]
2156+
},
2157+
{
2158+
"cell_type": "code",
2159+
"execution_count": 27,
2160+
"id": "bc2e62f3-899d-4a50-a2f4-8b2e73e1bc2f",
2161+
"metadata": {},
2162+
"outputs": [
2163+
{
2164+
"data": {
2165+
"text/plain": [
2166+
"32.6"
2167+
]
2168+
},
2169+
"execution_count": 27,
2170+
"metadata": {},
2171+
"output_type": "execute_result"
2172+
}
2173+
],
2174+
"source": [
2175+
"meteorites['mass (g)'].median()"
21282176
]
21292177
},
21302178
{
@@ -2142,7 +2190,7 @@
21422190
},
21432191
{
21442192
"cell_type": "code",
2145-
"execution_count": 27,
2193+
"execution_count": 28,
21462194
"id": "585af605-e601-49b6-bd1f-4838ab993302",
21472195
"metadata": {},
21482196
"outputs": [
@@ -2152,7 +2200,7 @@
21522200
"60000000.0"
21532201
]
21542202
},
2155-
"execution_count": 27,
2203+
"execution_count": 28,
21562204
"metadata": {},
21572205
"output_type": "execute_result"
21582206
}
@@ -2176,7 +2224,7 @@
21762224
},
21772225
{
21782226
"cell_type": "code",
2179-
"execution_count": 28,
2227+
"execution_count": 29,
21802228
"id": "29720ccc-3855-42f7-a0d0-e41a83cf1bef",
21812229
"metadata": {},
21822230
"outputs": [
@@ -2196,7 +2244,7 @@
21962244
"Name: 16392, dtype: object"
21972245
]
21982246
},
2199-
"execution_count": 28,
2247+
"execution_count": 29,
22002248
"metadata": {},
22012249
"output_type": "execute_result"
22022250
}
@@ -2220,7 +2268,7 @@
22202268
},
22212269
{
22222270
"cell_type": "code",
2223-
"execution_count": 29,
2271+
"execution_count": 30,
22242272
"id": "79c2a1db-0eeb-4173-964a-a38741c059ba",
22252273
"metadata": {},
22262274
"outputs": [
@@ -2230,7 +2278,7 @@
22302278
"466"
22312279
]
22322280
},
2233-
"execution_count": 29,
2281+
"execution_count": 30,
22342282
"metadata": {},
22352283
"output_type": "execute_result"
22362284
}
@@ -2254,7 +2302,7 @@
22542302
},
22552303
{
22562304
"cell_type": "code",
2257-
"execution_count": 30,
2305+
"execution_count": 31,
22582306
"id": "3ac57de5-7734-478a-9772-feb82890d5ef",
22592307
"metadata": {},
22602308
"outputs": [
@@ -2266,7 +2314,7 @@
22662314
" dtype=object)"
22672315
]
22682316
},
2269-
"execution_count": 30,
2317+
"execution_count": 31,
22702318
"metadata": {},
22712319
"output_type": "execute_result"
22722320
}
@@ -2299,7 +2347,7 @@
22992347
},
23002348
{
23012349
"cell_type": "code",
2302-
"execution_count": 31,
2350+
"execution_count": 32,
23032351
"id": "f0297d45-1d86-411f-ad8e-74cfaa3b2389",
23042352
"metadata": {},
23052353
"outputs": [
@@ -2512,7 +2560,7 @@
25122560
"max NaN 81.166670 354.473330 NaN "
25132561
]
25142562
},
2515-
"execution_count": 31,
2563+
"execution_count": 32,
25162564
"metadata": {},
25172565
"output_type": "execute_result"
25182566
}
@@ -2557,7 +2605,7 @@
25572605
},
25582606
{
25592607
"cell_type": "code",
2560-
"execution_count": 32,
2608+
"execution_count": 33,
25612609
"id": "876cafcb-00ab-4f5a-8b3c-bfead4f0b14c",
25622610
"metadata": {},
25632611
"outputs": [],
@@ -2578,7 +2626,7 @@
25782626
},
25792627
{
25802628
"cell_type": "code",
2581-
"execution_count": 33,
2629+
"execution_count": 34,
25822630
"id": "6402bb24-3da9-48e5-bde1-0b8a9576f00d",
25832631
"metadata": {},
25842632
"outputs": [],

0 commit comments

Comments
 (0)