Skip to content

Commit 7bc32f9

Browse files
edit some notebooks
1 parent 7d57881 commit 7bc32f9

38 files changed

+3070
-288
lines changed

Chapter1/good_practices.ipynb

Lines changed: 117 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1519,12 +1519,16 @@
15191519
"id": "9d24d6af",
15201520
"metadata": {},
15211521
"source": [
1522-
"To simplify checking if a Python object is of different types, you can group those types into a tuple within an instance call."
1522+
"The `isinstance()` function in Python is used to check if an object is an instance of a specified type or class. When checking for multiple types, we can optimize our code by using a tuple of types instead of multiple `isinstance()` calls or conditions.\n",
1523+
"\n",
1524+
"Let's break it down:\n",
1525+
"\n",
1526+
"1. Traditional approach (less efficient):"
15231527
]
15241528
},
15251529
{
15261530
"cell_type": "code",
1527-
"execution_count": 6,
1531+
"execution_count": 1,
15281532
"id": "90c6d002",
15291533
"metadata": {},
15301534
"outputs": [
@@ -1533,21 +1537,31 @@
15331537
"output_type": "stream",
15341538
"text": [
15351539
"True\n",
1536-
"True\n"
1540+
"True\n",
1541+
"False\n"
15371542
]
15381543
}
15391544
],
15401545
"source": [
15411546
"def is_number(num):\n",
15421547
" return isinstance(num, int) or isinstance(num, float)\n",
15431548
"\n",
1544-
"print(is_number(2))\n",
1545-
"print(is_number(1.5))"
1549+
"print(is_number(2)) # True\n",
1550+
"print(is_number(1.5)) # True\n",
1551+
"print(is_number(\"2\")) # False"
1552+
]
1553+
},
1554+
{
1555+
"cell_type": "markdown",
1556+
"id": "57d2acb4",
1557+
"metadata": {},
1558+
"source": [
1559+
"2. Optimized approach using a tuple:"
15461560
]
15471561
},
15481562
{
15491563
"cell_type": "code",
1550-
"execution_count": 9,
1564+
"execution_count": 2,
15511565
"id": "f29bba13",
15521566
"metadata": {},
15531567
"outputs": [
@@ -1556,16 +1570,110 @@
15561570
"output_type": "stream",
15571571
"text": [
15581572
"True\n",
1559-
"True\n"
1573+
"True\n",
1574+
"False\n"
15601575
]
15611576
}
15621577
],
15631578
"source": [
15641579
"def is_number(num):\n",
15651580
" return isinstance(num, (int, float))\n",
15661581
"\n",
1567-
"print(is_number(2))\n",
1568-
"print(is_number(1.5))"
1582+
"print(is_number(2)) # True\n",
1583+
"print(is_number(1.5)) # True\n",
1584+
"print(is_number(\"2\")) # False"
1585+
]
1586+
},
1587+
{
1588+
"cell_type": "markdown",
1589+
"id": "d9b34fe6",
1590+
"metadata": {},
1591+
"source": [
1592+
"Benefits of using a tuple:\n",
1593+
"\n",
1594+
"1. Conciseness: The code is more readable and compact.\n",
1595+
"2. Performance: It's slightly more efficient, especially when checking against many types.\n",
1596+
"3. Maintainability: Easier to add or remove types to check against."
1597+
]
1598+
},
1599+
{
1600+
"cell_type": "markdown",
1601+
"id": "80fb0047",
1602+
"metadata": {},
1603+
"source": [
1604+
"You can extend this concept to check for more types:"
1605+
]
1606+
},
1607+
{
1608+
"cell_type": "code",
1609+
"execution_count": 7,
1610+
"id": "1371324d",
1611+
"metadata": {},
1612+
"outputs": [
1613+
{
1614+
"name": "stdout",
1615+
"output_type": "stream",
1616+
"text": [
1617+
"True\n",
1618+
"True\n",
1619+
"True\n",
1620+
"False\n"
1621+
]
1622+
}
1623+
],
1624+
"source": [
1625+
"def is_sequence(obj):\n",
1626+
" return isinstance(obj, (list, tuple, str))\n",
1627+
"\n",
1628+
"print(is_sequence([1, 2, 3])) # True\n",
1629+
"print(is_sequence((1, 2, 3))) # True\n",
1630+
"print(is_sequence(\"123\")) # True\n",
1631+
"print(is_sequence(123)) # False"
1632+
]
1633+
},
1634+
{
1635+
"cell_type": "markdown",
1636+
"id": "7020a0d1",
1637+
"metadata": {},
1638+
"source": [
1639+
"For broader type checking, use Python's abstract base classes:"
1640+
]
1641+
},
1642+
{
1643+
"cell_type": "code",
1644+
"execution_count": 8,
1645+
"id": "7036c0f7",
1646+
"metadata": {},
1647+
"outputs": [
1648+
{
1649+
"name": "stdout",
1650+
"output_type": "stream",
1651+
"text": [
1652+
"True\n",
1653+
"True\n",
1654+
"True\n",
1655+
"False\n"
1656+
]
1657+
}
1658+
],
1659+
"source": [
1660+
"from collections.abc import Sequence\n",
1661+
"\n",
1662+
"def is_sequence(obj):\n",
1663+
" return isinstance(obj, Sequence)\n",
1664+
"\n",
1665+
"print(is_sequence([1, 2, 3])) # True\n",
1666+
"print(is_sequence((1, 2, 3))) # True\n",
1667+
"print(is_sequence(\"123\")) # True\n",
1668+
"print(is_sequence(123)) # False"
1669+
]
1670+
},
1671+
{
1672+
"cell_type": "markdown",
1673+
"id": "b941c3df",
1674+
"metadata": {},
1675+
"source": [
1676+
"In this case, we're checking if an object is either a Sequence (like lists, tuples, strings) or a Mapping (like dictionaries)."
15691677
]
15701678
},
15711679
{

Chapter5/SQL.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1140,7 +1140,7 @@
11401140
],
11411141
"metadata": {
11421142
"kernelspec": {
1143-
"display_name": "venv",
1143+
"display_name": "Python 3 (ipykernel)",
11441144
"language": "python",
11451145
"name": "python3"
11461146
},

Chapter5/best_python_practice_tools.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -376,7 +376,7 @@
376376
"hash": "484329849bb907480cd798e750759bc6f1d66c93f9e78e7055aa0a2c2de6b47b"
377377
},
378378
"kernelspec": {
379-
"display_name": "Python 3.8.9 ('venv': venv)",
379+
"display_name": "Python 3 (ipykernel)",
380380
"language": "python",
381381
"name": "python3"
382382
},

Chapter5/better_pandas.ipynb

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1834,6 +1834,7 @@
18341834
{
18351835
"cell_type": "code",
18361836
"execution_count": 38,
1837+
"id": "2314aa9f",
18371838
"metadata": {},
18381839
"outputs": [
18391840
{
@@ -4416,7 +4417,7 @@
44164417
"metadata": {
44174418
"celltoolbar": "Tags",
44184419
"kernelspec": {
4419-
"display_name": "venv",
4420+
"display_name": "Python 3 (ipykernel)",
44204421
"language": "python",
44214422
"name": "python3"
44224423
},

Chapter5/feature_engineer.ipynb

Lines changed: 237 additions & 16 deletions
Large diffs are not rendered by default.

Chapter5/feature_extraction.ipynb

Lines changed: 111 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -113,7 +113,24 @@
113113
},
114114
{
115115
"data": {
116-
"application/javascript": "\n setTimeout(function() {\n var nbb_cell_id = 30;\n var nbb_unformatted_code = \"import numpy as np\\nfrom distfit import distfit\\n\\nX = np.random.normal(0, 3, 1000)\\n\\n# Initialize model\\ndist = distfit()\\n\\n# Find best theoretical distribution for empirical data X\\ndistribution = dist.fit_transform(X)\\ndist.plot()\";\n var nbb_formatted_code = \"import numpy as np\\nfrom distfit import distfit\\n\\nX = np.random.normal(0, 3, 1000)\\n\\n# Initialize model\\ndist = distfit()\\n\\n# Find best theoretical distribution for empirical data X\\ndistribution = dist.fit_transform(X)\\ndist.plot()\";\n var nbb_cells = Jupyter.notebook.get_cells();\n for (var i = 0; i < nbb_cells.length; ++i) {\n if (nbb_cells[i].input_prompt_number == nbb_cell_id) {\n if (nbb_cells[i].get_text() == nbb_unformatted_code) {\n nbb_cells[i].set_text(nbb_formatted_code);\n }\n break;\n }\n }\n }, 500);\n ",
116+
"application/javascript": [
117+
"\n",
118+
" setTimeout(function() {\n",
119+
" var nbb_cell_id = 30;\n",
120+
" var nbb_unformatted_code = \"import numpy as np\\nfrom distfit import distfit\\n\\nX = np.random.normal(0, 3, 1000)\\n\\n# Initialize model\\ndist = distfit()\\n\\n# Find best theoretical distribution for empirical data X\\ndistribution = dist.fit_transform(X)\\ndist.plot()\";\n",
121+
" var nbb_formatted_code = \"import numpy as np\\nfrom distfit import distfit\\n\\nX = np.random.normal(0, 3, 1000)\\n\\n# Initialize model\\ndist = distfit()\\n\\n# Find best theoretical distribution for empirical data X\\ndistribution = dist.fit_transform(X)\\ndist.plot()\";\n",
122+
" var nbb_cells = Jupyter.notebook.get_cells();\n",
123+
" for (var i = 0; i < nbb_cells.length; ++i) {\n",
124+
" if (nbb_cells[i].input_prompt_number == nbb_cell_id) {\n",
125+
" if (nbb_cells[i].get_text() == nbb_unformatted_code) {\n",
126+
" nbb_cells[i].set_text(nbb_formatted_code);\n",
127+
" }\n",
128+
" break;\n",
129+
" }\n",
130+
" }\n",
131+
" }, 500);\n",
132+
" "
133+
],
117134
"text/plain": [
118135
"<IPython.core.display.Javascript object>"
119136
]
@@ -359,7 +376,24 @@
359376
},
360377
{
361378
"data": {
362-
"application/javascript": "\n setTimeout(function() {\n var nbb_cell_id = 6;\n var nbb_unformatted_code = \"import pandas as pd\\nfrom fastai.tabular.core import cont_cat_split\\n\\ndf = pd.DataFrame(\\n {\\n \\\"col1\\\": [1, 2, 3, 4, 5],\\n \\\"col2\\\": [\\\"a\\\", \\\"b\\\", \\\"c\\\", \\\"d\\\", \\\"e\\\"],\\n \\\"col3\\\": [1.0, 2.0, 3.0, 4.0, 5.0],\\n }\\n)\\n\\ncont_names, cat_names = cont_cat_split(df)\\nprint(\\\"Continuous columns:\\\", cont_names)\\nprint(\\\"Categorical columns:\\\", cat_names)\";\n var nbb_formatted_code = \"import pandas as pd\\nfrom fastai.tabular.core import cont_cat_split\\n\\ndf = pd.DataFrame(\\n {\\n \\\"col1\\\": [1, 2, 3, 4, 5],\\n \\\"col2\\\": [\\\"a\\\", \\\"b\\\", \\\"c\\\", \\\"d\\\", \\\"e\\\"],\\n \\\"col3\\\": [1.0, 2.0, 3.0, 4.0, 5.0],\\n }\\n)\\n\\ncont_names, cat_names = cont_cat_split(df)\\nprint(\\\"Continuous columns:\\\", cont_names)\\nprint(\\\"Categorical columns:\\\", cat_names)\";\n var nbb_cells = Jupyter.notebook.get_cells();\n for (var i = 0; i < nbb_cells.length; ++i) {\n if (nbb_cells[i].input_prompt_number == nbb_cell_id) {\n if (nbb_cells[i].get_text() == nbb_unformatted_code) {\n nbb_cells[i].set_text(nbb_formatted_code);\n }\n break;\n }\n }\n }, 500);\n ",
379+
"application/javascript": [
380+
"\n",
381+
" setTimeout(function() {\n",
382+
" var nbb_cell_id = 6;\n",
383+
" var nbb_unformatted_code = \"import pandas as pd\\nfrom fastai.tabular.core import cont_cat_split\\n\\ndf = pd.DataFrame(\\n {\\n \\\"col1\\\": [1, 2, 3, 4, 5],\\n \\\"col2\\\": [\\\"a\\\", \\\"b\\\", \\\"c\\\", \\\"d\\\", \\\"e\\\"],\\n \\\"col3\\\": [1.0, 2.0, 3.0, 4.0, 5.0],\\n }\\n)\\n\\ncont_names, cat_names = cont_cat_split(df)\\nprint(\\\"Continuous columns:\\\", cont_names)\\nprint(\\\"Categorical columns:\\\", cat_names)\";\n",
384+
" var nbb_formatted_code = \"import pandas as pd\\nfrom fastai.tabular.core import cont_cat_split\\n\\ndf = pd.DataFrame(\\n {\\n \\\"col1\\\": [1, 2, 3, 4, 5],\\n \\\"col2\\\": [\\\"a\\\", \\\"b\\\", \\\"c\\\", \\\"d\\\", \\\"e\\\"],\\n \\\"col3\\\": [1.0, 2.0, 3.0, 4.0, 5.0],\\n }\\n)\\n\\ncont_names, cat_names = cont_cat_split(df)\\nprint(\\\"Continuous columns:\\\", cont_names)\\nprint(\\\"Categorical columns:\\\", cat_names)\";\n",
385+
" var nbb_cells = Jupyter.notebook.get_cells();\n",
386+
" for (var i = 0; i < nbb_cells.length; ++i) {\n",
387+
" if (nbb_cells[i].input_prompt_number == nbb_cell_id) {\n",
388+
" if (nbb_cells[i].get_text() == nbb_unformatted_code) {\n",
389+
" nbb_cells[i].set_text(nbb_formatted_code);\n",
390+
" }\n",
391+
" break;\n",
392+
" }\n",
393+
" }\n",
394+
" }, 500);\n",
395+
" "
396+
],
363397
"text/plain": [
364398
"<IPython.core.display.Javascript object>"
365399
]
@@ -406,7 +440,24 @@
406440
},
407441
{
408442
"data": {
409-
"application/javascript": "\n setTimeout(function() {\n var nbb_cell_id = 7;\n var nbb_unformatted_code = \"cont_names, cat_names = cont_cat_split(df, max_card=3)\\nprint(\\\"Continuous columns:\\\", cont_names)\\nprint(\\\"Categorical columns:\\\", cat_names)\";\n var nbb_formatted_code = \"cont_names, cat_names = cont_cat_split(df, max_card=3)\\nprint(\\\"Continuous columns:\\\", cont_names)\\nprint(\\\"Categorical columns:\\\", cat_names)\";\n var nbb_cells = Jupyter.notebook.get_cells();\n for (var i = 0; i < nbb_cells.length; ++i) {\n if (nbb_cells[i].input_prompt_number == nbb_cell_id) {\n if (nbb_cells[i].get_text() == nbb_unformatted_code) {\n nbb_cells[i].set_text(nbb_formatted_code);\n }\n break;\n }\n }\n }, 500);\n ",
443+
"application/javascript": [
444+
"\n",
445+
" setTimeout(function() {\n",
446+
" var nbb_cell_id = 7;\n",
447+
" var nbb_unformatted_code = \"cont_names, cat_names = cont_cat_split(df, max_card=3)\\nprint(\\\"Continuous columns:\\\", cont_names)\\nprint(\\\"Categorical columns:\\\", cat_names)\";\n",
448+
" var nbb_formatted_code = \"cont_names, cat_names = cont_cat_split(df, max_card=3)\\nprint(\\\"Continuous columns:\\\", cont_names)\\nprint(\\\"Categorical columns:\\\", cat_names)\";\n",
449+
" var nbb_cells = Jupyter.notebook.get_cells();\n",
450+
" for (var i = 0; i < nbb_cells.length; ++i) {\n",
451+
" if (nbb_cells[i].input_prompt_number == nbb_cell_id) {\n",
452+
" if (nbb_cells[i].get_text() == nbb_unformatted_code) {\n",
453+
" nbb_cells[i].set_text(nbb_formatted_code);\n",
454+
" }\n",
455+
" break;\n",
456+
" }\n",
457+
" }\n",
458+
" }, 500);\n",
459+
" "
460+
],
410461
"text/plain": [
411462
"<IPython.core.display.Javascript object>"
412463
]
@@ -1327,7 +1378,24 @@
13271378
},
13281379
{
13291380
"data": {
1330-
"application/javascript": "\n setTimeout(function() {\n var nbb_cell_id = 21;\n var nbb_unformatted_code = \"import probablepeople as pp\\n\\npp.parse(\\\"Mr. Owen Harris II\\\")\";\n var nbb_formatted_code = \"import probablepeople as pp\\n\\npp.parse(\\\"Mr. Owen Harris II\\\")\";\n var nbb_cells = Jupyter.notebook.get_cells();\n for (var i = 0; i < nbb_cells.length; ++i) {\n if (nbb_cells[i].input_prompt_number == nbb_cell_id) {\n if (nbb_cells[i].get_text() == nbb_unformatted_code) {\n nbb_cells[i].set_text(nbb_formatted_code);\n }\n break;\n }\n }\n }, 500);\n ",
1381+
"application/javascript": [
1382+
"\n",
1383+
" setTimeout(function() {\n",
1384+
" var nbb_cell_id = 21;\n",
1385+
" var nbb_unformatted_code = \"import probablepeople as pp\\n\\npp.parse(\\\"Mr. Owen Harris II\\\")\";\n",
1386+
" var nbb_formatted_code = \"import probablepeople as pp\\n\\npp.parse(\\\"Mr. Owen Harris II\\\")\";\n",
1387+
" var nbb_cells = Jupyter.notebook.get_cells();\n",
1388+
" for (var i = 0; i < nbb_cells.length; ++i) {\n",
1389+
" if (nbb_cells[i].input_prompt_number == nbb_cell_id) {\n",
1390+
" if (nbb_cells[i].get_text() == nbb_unformatted_code) {\n",
1391+
" nbb_cells[i].set_text(nbb_formatted_code);\n",
1392+
" }\n",
1393+
" break;\n",
1394+
" }\n",
1395+
" }\n",
1396+
" }, 500);\n",
1397+
" "
1398+
],
13311399
"text/plain": [
13321400
"<IPython.core.display.Javascript object>"
13331401
]
@@ -1368,7 +1436,24 @@
13681436
},
13691437
{
13701438
"data": {
1371-
"application/javascript": "\n setTimeout(function() {\n var nbb_cell_id = 22;\n var nbb_unformatted_code = \"pp.parse(\\\"Kate & John Cumings\\\")\";\n var nbb_formatted_code = \"pp.parse(\\\"Kate & John Cumings\\\")\";\n var nbb_cells = Jupyter.notebook.get_cells();\n for (var i = 0; i < nbb_cells.length; ++i) {\n if (nbb_cells[i].input_prompt_number == nbb_cell_id) {\n if (nbb_cells[i].get_text() == nbb_unformatted_code) {\n nbb_cells[i].set_text(nbb_formatted_code);\n }\n break;\n }\n }\n }, 500);\n ",
1439+
"application/javascript": [
1440+
"\n",
1441+
" setTimeout(function() {\n",
1442+
" var nbb_cell_id = 22;\n",
1443+
" var nbb_unformatted_code = \"pp.parse(\\\"Kate & John Cumings\\\")\";\n",
1444+
" var nbb_formatted_code = \"pp.parse(\\\"Kate & John Cumings\\\")\";\n",
1445+
" var nbb_cells = Jupyter.notebook.get_cells();\n",
1446+
" for (var i = 0; i < nbb_cells.length; ++i) {\n",
1447+
" if (nbb_cells[i].input_prompt_number == nbb_cell_id) {\n",
1448+
" if (nbb_cells[i].get_text() == nbb_unformatted_code) {\n",
1449+
" nbb_cells[i].set_text(nbb_formatted_code);\n",
1450+
" }\n",
1451+
" break;\n",
1452+
" }\n",
1453+
" }\n",
1454+
" }, 500);\n",
1455+
" "
1456+
],
13721457
"text/plain": [
13731458
"<IPython.core.display.Javascript object>"
13741459
]
@@ -1406,7 +1491,24 @@
14061491
},
14071492
{
14081493
"data": {
1409-
"application/javascript": "\n setTimeout(function() {\n var nbb_cell_id = 23;\n var nbb_unformatted_code = \"pp.parse('Prefect Technologies, Inc')\";\n var nbb_formatted_code = \"pp.parse(\\\"Prefect Technologies, Inc\\\")\";\n var nbb_cells = Jupyter.notebook.get_cells();\n for (var i = 0; i < nbb_cells.length; ++i) {\n if (nbb_cells[i].input_prompt_number == nbb_cell_id) {\n if (nbb_cells[i].get_text() == nbb_unformatted_code) {\n nbb_cells[i].set_text(nbb_formatted_code);\n }\n break;\n }\n }\n }, 500);\n ",
1494+
"application/javascript": [
1495+
"\n",
1496+
" setTimeout(function() {\n",
1497+
" var nbb_cell_id = 23;\n",
1498+
" var nbb_unformatted_code = \"pp.parse('Prefect Technologies, Inc')\";\n",
1499+
" var nbb_formatted_code = \"pp.parse(\\\"Prefect Technologies, Inc\\\")\";\n",
1500+
" var nbb_cells = Jupyter.notebook.get_cells();\n",
1501+
" for (var i = 0; i < nbb_cells.length; ++i) {\n",
1502+
" if (nbb_cells[i].input_prompt_number == nbb_cell_id) {\n",
1503+
" if (nbb_cells[i].get_text() == nbb_unformatted_code) {\n",
1504+
" nbb_cells[i].set_text(nbb_formatted_code);\n",
1505+
" }\n",
1506+
" break;\n",
1507+
" }\n",
1508+
" }\n",
1509+
" }, 500);\n",
1510+
" "
1511+
],
14101512
"text/plain": [
14111513
"<IPython.core.display.Javascript object>"
14121514
]
@@ -1493,9 +1595,9 @@
14931595
"hash": "484329849bb907480cd798e750759bc6f1d66c93f9e78e7055aa0a2c2de6b47b"
14941596
},
14951597
"kernelspec": {
1496-
"display_name": "Data-science",
1598+
"display_name": "Python 3 (ipykernel)",
14971599
"language": "python",
1498-
"name": "data-science"
1600+
"name": "python3"
14991601
},
15001602
"language_info": {
15011603
"codemirror_mode": {
@@ -1507,7 +1609,7 @@
15071609
"name": "python",
15081610
"nbconvert_exporter": "python",
15091611
"pygments_lexer": "ipython3",
1510-
"version": "3.9.6"
1612+
"version": "3.11.6"
15111613
},
15121614
"toc": {
15131615
"base_numbering": 1,

0 commit comments

Comments
 (0)