Skip to content

Commit da7e4d5

Browse files
add duck typing
1 parent 35d7e8f commit da7e4d5

File tree

4 files changed

+629
-1
lines changed

4 files changed

+629
-1
lines changed

Chapter1/class.ipynb

Lines changed: 227 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1807,6 +1807,233 @@
18071807
"kmeans_file_path = \"kmeans_model.pkl\"\n",
18081808
"kmeans.to_pickle(kmeans_file_path)"
18091809
]
1810+
},
1811+
{
1812+
"cell_type": "markdown",
1813+
"id": "647169d6",
1814+
"metadata": {},
1815+
"source": [
1816+
"### Embracing Duck Typing for Cleaner, More Adaptable Data Science Code"
1817+
]
1818+
},
1819+
{
1820+
"cell_type": "markdown",
1821+
"id": "6fce343a",
1822+
"metadata": {},
1823+
"source": [
1824+
"Duck typing comes from the phrase \"If it walks like a duck and quacks like a duck, then it must be a duck.\" \n",
1825+
"\n",
1826+
"This lets you write flexible code that works with different types of objects, as long as they have the methods or attributes you're using.\n",
1827+
"\n",
1828+
"For data scientists, duck typing allows creating versatile functions that work with various data structures without explicitly checking their types.\n",
1829+
"\n",
1830+
"Here's a simple example:"
1831+
]
1832+
},
1833+
{
1834+
"cell_type": "code",
1835+
"execution_count": 5,
1836+
"id": "26fe0dfe",
1837+
"metadata": {},
1838+
"outputs": [
1839+
{
1840+
"name": "stdout",
1841+
"output_type": "stream",
1842+
"text": [
1843+
"Mean: 3.0\n",
1844+
"Standard Deviation: 1.4142135623730951\n",
1845+
"Mean: 3.0\n",
1846+
"Standard Deviation: 1.5811388300841898\n",
1847+
"Mean: 3.0\n",
1848+
"Standard Deviation: 1.4142135623730951\n"
1849+
]
1850+
}
1851+
],
1852+
"source": [
1853+
"import numpy as np\n",
1854+
"import pandas as pd\n",
1855+
"\n",
1856+
"\n",
1857+
"class CustomDataFrame:\n",
1858+
" def __init__(self, data):\n",
1859+
" self.data = data\n",
1860+
"\n",
1861+
" def mean(self):\n",
1862+
" return np.mean(self.data)\n",
1863+
"\n",
1864+
" def std(self):\n",
1865+
" return np.std(self.data)\n",
1866+
"\n",
1867+
"\n",
1868+
"def analyze_data(data):\n",
1869+
" print(f\"Mean: {data.mean()}\")\n",
1870+
" print(f\"Standard Deviation: {data.std()}\")\n",
1871+
"\n",
1872+
"\n",
1873+
"# These all work, thanks to duck typing\n",
1874+
"numpy_array = np.array([1, 2, 3, 4, 5])\n",
1875+
"pandas_series = pd.Series([1, 2, 3, 4, 5])\n",
1876+
"custom_df = CustomDataFrame([1, 2, 3, 4, 5])\n",
1877+
"\n",
1878+
"analyze_data(numpy_array)\n",
1879+
"analyze_data(pandas_series)\n",
1880+
"analyze_data(custom_df)"
1881+
]
1882+
},
1883+
{
1884+
"cell_type": "markdown",
1885+
"id": "96171599",
1886+
"metadata": {},
1887+
"source": [
1888+
"In this example, the `analyze_data` function works with NumPy arrays, Pandas Series, and our custom `CustomDataFrame` class, because they all have `mean` and `std` methods. This flexibility is powerful in data science workflows where you might be working with various data structures.\n",
1889+
"\n",
1890+
"This flexibility is valuable in data science because:\n",
1891+
"\n",
1892+
"1. It saves time: You don't need separate functions for different data types.\n",
1893+
"\n",
1894+
"Bad example:\n"
1895+
]
1896+
},
1897+
{
1898+
"cell_type": "code",
1899+
"execution_count": 2,
1900+
"id": "b84455c8",
1901+
"metadata": {},
1902+
"outputs": [
1903+
{
1904+
"name": "stdout",
1905+
"output_type": "stream",
1906+
"text": [
1907+
"Mean: 3.0\n",
1908+
"Standard Deviation: 1.4142135623730951\n",
1909+
"Mean: 3.0\n",
1910+
"Standard Deviation: 1.5811388300841898\n",
1911+
"Mean: 3.0\n",
1912+
"Standard Deviation: 1.4142135623730951\n"
1913+
]
1914+
}
1915+
],
1916+
"source": [
1917+
"def analyze_numpy_array(data):\n",
1918+
" print(f\"Mean: {np.mean(data)}\")\n",
1919+
" print(f\"Standard Deviation: {np.std(data)}\")\n",
1920+
"\n",
1921+
"def analyze_pandas_series(data):\n",
1922+
" print(f\"Mean: {data.mean()}\")\n",
1923+
" print(f\"Standard Deviation: {data.std()}\")\n",
1924+
"\n",
1925+
"def analyze_custom_df(data):\n",
1926+
" print(f\"Mean: {data.mean()}\")\n",
1927+
" print(f\"Standard Deviation: {data.std()}\")\n",
1928+
"\n",
1929+
"numpy_array = np.array([1, 2, 3, 4, 5])\n",
1930+
"pandas_series = pd.Series([1, 2, 3, 4, 5])\n",
1931+
"custom_df = CustomDataFrame([1, 2, 3, 4, 5])\n",
1932+
"\n",
1933+
"analyze_numpy_array(numpy_array)\n",
1934+
"analyze_pandas_series(pandas_series)\n",
1935+
"analyze_custom_df(custom_df)"
1936+
]
1937+
},
1938+
{
1939+
"cell_type": "markdown",
1940+
"id": "c569cd67",
1941+
"metadata": {},
1942+
"source": [
1943+
"2. It's cleaner: You avoid lots of `if` statements checking for types."
1944+
]
1945+
},
1946+
{
1947+
"cell_type": "code",
1948+
"execution_count": 3,
1949+
"id": "e463b7aa",
1950+
"metadata": {},
1951+
"outputs": [
1952+
{
1953+
"name": "stdout",
1954+
"output_type": "stream",
1955+
"text": [
1956+
"Mean: 3.0\n",
1957+
"Standard Deviation: 1.4142135623730951\n",
1958+
"Mean: 3.0\n",
1959+
"Standard Deviation: 1.5811388300841898\n",
1960+
"Mean: 3.0\n",
1961+
"Standard Deviation: 1.4142135623730951\n"
1962+
]
1963+
}
1964+
],
1965+
"source": [
1966+
"def analyze_data(data):\n",
1967+
" if isinstance(data, np.ndarray):\n",
1968+
" mean = np.mean(data)\n",
1969+
" std = np.std(data)\n",
1970+
" elif isinstance(data, pd.Series):\n",
1971+
" mean = data.mean()\n",
1972+
" std = data.std()\n",
1973+
" elif isinstance(data, CustomDataFrame):\n",
1974+
" mean = data.mean()\n",
1975+
" std = data.std()\n",
1976+
" else:\n",
1977+
" raise TypeError(\"Unsupported data type\")\n",
1978+
" \n",
1979+
" print(f\"Mean: {mean}\")\n",
1980+
" print(f\"Standard Deviation: {std}\")\n",
1981+
"\n",
1982+
"numpy_array = np.array([1, 2, 3, 4, 5])\n",
1983+
"pandas_series = pd.Series([1, 2, 3, 4, 5])\n",
1984+
"custom_df = CustomDataFrame([1, 2, 3, 4, 5])\n",
1985+
"python_list = [1, 2, 3, 4, 5]\n",
1986+
"\n",
1987+
"analyze_data(numpy_array)\n",
1988+
"analyze_data(pandas_series)\n",
1989+
"analyze_data(custom_df)"
1990+
]
1991+
},
1992+
{
1993+
"cell_type": "markdown",
1994+
"id": "4591a347",
1995+
"metadata": {},
1996+
"source": [
1997+
"3. It's more adaptable: Your code can handle new data types easily.\n",
1998+
"\n",
1999+
"Bad example:"
2000+
]
2001+
},
2002+
{
2003+
"cell_type": "code",
2004+
"execution_count": 4,
2005+
"id": "9bc290c8",
2006+
"metadata": {},
2007+
"outputs": [],
2008+
"source": [
2009+
"def analyze_data(data):\n",
2010+
" if isinstance(data, np.ndarray):\n",
2011+
" print(f\"Mean: {np.mean(data)}\")\n",
2012+
" print(f\"Standard Deviation: {np.std(data)}\")\n",
2013+
" elif isinstance(data, pd.Series):\n",
2014+
" print(f\"Mean: {data.mean()}\")\n",
2015+
" print(f\"Standard Deviation: {data.std()}\")\n",
2016+
" elif isinstance(data, CustomDataFrame):\n",
2017+
" print(f\"Mean: {data.mean()}\")\n",
2018+
" print(f\"Standard Deviation: {data.std()}\")\n",
2019+
" else:\n",
2020+
" raise TypeError(\"Unsupported data type\")\n",
2021+
"\n",
2022+
"\n",
2023+
"# If you introduce a new data type, you have to modify the function\n",
2024+
"class NewDataType:\n",
2025+
" def __init__(self, data):\n",
2026+
" self.data = data\n",
2027+
" def mean(self):\n",
2028+
" return sum(self.data) / len(self.data)\n",
2029+
" def std(self):\n",
2030+
" mean = self.mean()\n",
2031+
" return (sum((x - mean) ** 2 for x in self.data) / len(self.data)) ** 0.5\n",
2032+
"\n",
2033+
"new_data = NewDataType([1, 2, 3, 4, 5])\n",
2034+
"# This will raise a TypeError\n",
2035+
"# analyze_data(new_data)"
2036+
]
18102037
}
18112038
],
18122039
"metadata": {

0 commit comments

Comments
 (0)