|
1807 | 1807 | "kmeans_file_path = \"kmeans_model.pkl\"\n",
|
1808 | 1808 | "kmeans.to_pickle(kmeans_file_path)"
|
1809 | 1809 | ]
|
| 1810 | + }, |
| 1811 | + { |
| 1812 | + "cell_type": "markdown", |
| 1813 | + "id": "647169d6", |
| 1814 | + "metadata": {}, |
| 1815 | + "source": [ |
| 1816 | + "### Embracing Duck Typing for Cleaner, More Adaptable Data Science Code" |
| 1817 | + ] |
| 1818 | + }, |
| 1819 | + { |
| 1820 | + "cell_type": "markdown", |
| 1821 | + "id": "6fce343a", |
| 1822 | + "metadata": {}, |
| 1823 | + "source": [ |
| 1824 | + "Duck typing comes from the phrase \"If it walks like a duck and quacks like a duck, then it must be a duck.\" \n", |
| 1825 | + "\n", |
| 1826 | + "This lets you write flexible code that works with different types of objects, as long as they have the methods or attributes you're using.\n", |
| 1827 | + "\n", |
| 1828 | + "For data scientists, duck typing allows creating versatile functions that work with various data structures without explicitly checking their types.\n", |
| 1829 | + "\n", |
| 1830 | + "Here's a simple example:" |
| 1831 | + ] |
| 1832 | + }, |
| 1833 | + { |
| 1834 | + "cell_type": "code", |
| 1835 | + "execution_count": 5, |
| 1836 | + "id": "26fe0dfe", |
| 1837 | + "metadata": {}, |
| 1838 | + "outputs": [ |
| 1839 | + { |
| 1840 | + "name": "stdout", |
| 1841 | + "output_type": "stream", |
| 1842 | + "text": [ |
| 1843 | + "Mean: 3.0\n", |
| 1844 | + "Standard Deviation: 1.4142135623730951\n", |
| 1845 | + "Mean: 3.0\n", |
| 1846 | + "Standard Deviation: 1.5811388300841898\n", |
| 1847 | + "Mean: 3.0\n", |
| 1848 | + "Standard Deviation: 1.4142135623730951\n" |
| 1849 | + ] |
| 1850 | + } |
| 1851 | + ], |
| 1852 | + "source": [ |
| 1853 | + "import numpy as np\n", |
| 1854 | + "import pandas as pd\n", |
| 1855 | + "\n", |
| 1856 | + "\n", |
| 1857 | + "class CustomDataFrame:\n", |
| 1858 | + " def __init__(self, data):\n", |
| 1859 | + " self.data = data\n", |
| 1860 | + "\n", |
| 1861 | + " def mean(self):\n", |
| 1862 | + " return np.mean(self.data)\n", |
| 1863 | + "\n", |
| 1864 | + " def std(self):\n", |
| 1865 | + " return np.std(self.data)\n", |
| 1866 | + "\n", |
| 1867 | + "\n", |
| 1868 | + "def analyze_data(data):\n", |
| 1869 | + " print(f\"Mean: {data.mean()}\")\n", |
| 1870 | + " print(f\"Standard Deviation: {data.std()}\")\n", |
| 1871 | + "\n", |
| 1872 | + "\n", |
| 1873 | + "# These all work, thanks to duck typing\n", |
| 1874 | + "numpy_array = np.array([1, 2, 3, 4, 5])\n", |
| 1875 | + "pandas_series = pd.Series([1, 2, 3, 4, 5])\n", |
| 1876 | + "custom_df = CustomDataFrame([1, 2, 3, 4, 5])\n", |
| 1877 | + "\n", |
| 1878 | + "analyze_data(numpy_array)\n", |
| 1879 | + "analyze_data(pandas_series)\n", |
| 1880 | + "analyze_data(custom_df)" |
| 1881 | + ] |
| 1882 | + }, |
| 1883 | + { |
| 1884 | + "cell_type": "markdown", |
| 1885 | + "id": "96171599", |
| 1886 | + "metadata": {}, |
| 1887 | + "source": [ |
| 1888 | + "In this example, the `analyze_data` function works with NumPy arrays, Pandas Series, and our custom `CustomDataFrame` class, because they all have `mean` and `std` methods. This flexibility is powerful in data science workflows where you might be working with various data structures.\n", |
| 1889 | + "\n", |
| 1890 | + "This flexibility is valuable in data science because:\n", |
| 1891 | + "\n", |
| 1892 | + "1. It saves time: You don't need separate functions for different data types.\n", |
| 1893 | + "\n", |
| 1894 | + "Bad example:\n" |
| 1895 | + ] |
| 1896 | + }, |
| 1897 | + { |
| 1898 | + "cell_type": "code", |
| 1899 | + "execution_count": 2, |
| 1900 | + "id": "b84455c8", |
| 1901 | + "metadata": {}, |
| 1902 | + "outputs": [ |
| 1903 | + { |
| 1904 | + "name": "stdout", |
| 1905 | + "output_type": "stream", |
| 1906 | + "text": [ |
| 1907 | + "Mean: 3.0\n", |
| 1908 | + "Standard Deviation: 1.4142135623730951\n", |
| 1909 | + "Mean: 3.0\n", |
| 1910 | + "Standard Deviation: 1.5811388300841898\n", |
| 1911 | + "Mean: 3.0\n", |
| 1912 | + "Standard Deviation: 1.4142135623730951\n" |
| 1913 | + ] |
| 1914 | + } |
| 1915 | + ], |
| 1916 | + "source": [ |
| 1917 | + "def analyze_numpy_array(data):\n", |
| 1918 | + " print(f\"Mean: {np.mean(data)}\")\n", |
| 1919 | + " print(f\"Standard Deviation: {np.std(data)}\")\n", |
| 1920 | + "\n", |
| 1921 | + "def analyze_pandas_series(data):\n", |
| 1922 | + " print(f\"Mean: {data.mean()}\")\n", |
| 1923 | + " print(f\"Standard Deviation: {data.std()}\")\n", |
| 1924 | + "\n", |
| 1925 | + "def analyze_custom_df(data):\n", |
| 1926 | + " print(f\"Mean: {data.mean()}\")\n", |
| 1927 | + " print(f\"Standard Deviation: {data.std()}\")\n", |
| 1928 | + "\n", |
| 1929 | + "numpy_array = np.array([1, 2, 3, 4, 5])\n", |
| 1930 | + "pandas_series = pd.Series([1, 2, 3, 4, 5])\n", |
| 1931 | + "custom_df = CustomDataFrame([1, 2, 3, 4, 5])\n", |
| 1932 | + "\n", |
| 1933 | + "analyze_numpy_array(numpy_array)\n", |
| 1934 | + "analyze_pandas_series(pandas_series)\n", |
| 1935 | + "analyze_custom_df(custom_df)" |
| 1936 | + ] |
| 1937 | + }, |
| 1938 | + { |
| 1939 | + "cell_type": "markdown", |
| 1940 | + "id": "c569cd67", |
| 1941 | + "metadata": {}, |
| 1942 | + "source": [ |
| 1943 | + "2. It's cleaner: You avoid lots of `if` statements checking for types." |
| 1944 | + ] |
| 1945 | + }, |
| 1946 | + { |
| 1947 | + "cell_type": "code", |
| 1948 | + "execution_count": 3, |
| 1949 | + "id": "e463b7aa", |
| 1950 | + "metadata": {}, |
| 1951 | + "outputs": [ |
| 1952 | + { |
| 1953 | + "name": "stdout", |
| 1954 | + "output_type": "stream", |
| 1955 | + "text": [ |
| 1956 | + "Mean: 3.0\n", |
| 1957 | + "Standard Deviation: 1.4142135623730951\n", |
| 1958 | + "Mean: 3.0\n", |
| 1959 | + "Standard Deviation: 1.5811388300841898\n", |
| 1960 | + "Mean: 3.0\n", |
| 1961 | + "Standard Deviation: 1.4142135623730951\n" |
| 1962 | + ] |
| 1963 | + } |
| 1964 | + ], |
| 1965 | + "source": [ |
| 1966 | + "def analyze_data(data):\n", |
| 1967 | + " if isinstance(data, np.ndarray):\n", |
| 1968 | + " mean = np.mean(data)\n", |
| 1969 | + " std = np.std(data)\n", |
| 1970 | + " elif isinstance(data, pd.Series):\n", |
| 1971 | + " mean = data.mean()\n", |
| 1972 | + " std = data.std()\n", |
| 1973 | + " elif isinstance(data, CustomDataFrame):\n", |
| 1974 | + " mean = data.mean()\n", |
| 1975 | + " std = data.std()\n", |
| 1976 | + " else:\n", |
| 1977 | + " raise TypeError(\"Unsupported data type\")\n", |
| 1978 | + " \n", |
| 1979 | + " print(f\"Mean: {mean}\")\n", |
| 1980 | + " print(f\"Standard Deviation: {std}\")\n", |
| 1981 | + "\n", |
| 1982 | + "numpy_array = np.array([1, 2, 3, 4, 5])\n", |
| 1983 | + "pandas_series = pd.Series([1, 2, 3, 4, 5])\n", |
| 1984 | + "custom_df = CustomDataFrame([1, 2, 3, 4, 5])\n", |
| 1985 | + "python_list = [1, 2, 3, 4, 5]\n", |
| 1986 | + "\n", |
| 1987 | + "analyze_data(numpy_array)\n", |
| 1988 | + "analyze_data(pandas_series)\n", |
| 1989 | + "analyze_data(custom_df)" |
| 1990 | + ] |
| 1991 | + }, |
| 1992 | + { |
| 1993 | + "cell_type": "markdown", |
| 1994 | + "id": "4591a347", |
| 1995 | + "metadata": {}, |
| 1996 | + "source": [ |
| 1997 | + "3. It's more adaptable: Your code can handle new data types easily.\n", |
| 1998 | + "\n", |
| 1999 | + "Bad example:" |
| 2000 | + ] |
| 2001 | + }, |
| 2002 | + { |
| 2003 | + "cell_type": "code", |
| 2004 | + "execution_count": 4, |
| 2005 | + "id": "9bc290c8", |
| 2006 | + "metadata": {}, |
| 2007 | + "outputs": [], |
| 2008 | + "source": [ |
| 2009 | + "def analyze_data(data):\n", |
| 2010 | + " if isinstance(data, np.ndarray):\n", |
| 2011 | + " print(f\"Mean: {np.mean(data)}\")\n", |
| 2012 | + " print(f\"Standard Deviation: {np.std(data)}\")\n", |
| 2013 | + " elif isinstance(data, pd.Series):\n", |
| 2014 | + " print(f\"Mean: {data.mean()}\")\n", |
| 2015 | + " print(f\"Standard Deviation: {data.std()}\")\n", |
| 2016 | + " elif isinstance(data, CustomDataFrame):\n", |
| 2017 | + " print(f\"Mean: {data.mean()}\")\n", |
| 2018 | + " print(f\"Standard Deviation: {data.std()}\")\n", |
| 2019 | + " else:\n", |
| 2020 | + " raise TypeError(\"Unsupported data type\")\n", |
| 2021 | + "\n", |
| 2022 | + "\n", |
| 2023 | + "# If you introduce a new data type, you have to modify the function\n", |
| 2024 | + "class NewDataType:\n", |
| 2025 | + " def __init__(self, data):\n", |
| 2026 | + " self.data = data\n", |
| 2027 | + " def mean(self):\n", |
| 2028 | + " return sum(self.data) / len(self.data)\n", |
| 2029 | + " def std(self):\n", |
| 2030 | + " mean = self.mean()\n", |
| 2031 | + " return (sum((x - mean) ** 2 for x in self.data) / len(self.data)) ** 0.5\n", |
| 2032 | + "\n", |
| 2033 | + "new_data = NewDataType([1, 2, 3, 4, 5])\n", |
| 2034 | + "# This will raise a TypeError\n", |
| 2035 | + "# analyze_data(new_data)" |
| 2036 | + ] |
1810 | 2037 | }
|
1811 | 2038 | ],
|
1812 | 2039 | "metadata": {
|
|
0 commit comments