CodeCutTech
diff --git a/‎Chapter1/class.ipynb
Lines changed: 227 additions & 0 deletions b/‎Chapter1/class.ipynb
Lines changed: 227 additions & 0 deletions
@@ -1807,6 +1807,233 @@
     "kmeans_file_path = \"kmeans_model.pkl\"\n",
     "kmeans.to_pickle(kmeans_file_path)"
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "647169d6",
+   "metadata": {},
+   "source": [
+    "### Embracing Duck Typing for Cleaner, More Adaptable Data Science Code"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6fce343a",
+   "metadata": {},
+   "source": [
+    "Duck typing comes from the phrase \"If it walks like a duck and quacks like a duck, then it must be a duck.\" \n",
+    "\n",
+    "This lets you write flexible code that works with different types of objects, as long as they have the methods or attributes you're using.\n",
+    "\n",
+    "For data scientists, duck typing allows creating versatile functions that work with various data structures without explicitly checking their types.\n",
+    "\n",
+    "Here's a simple example:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "26fe0dfe",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Mean: 3.0\n",
+      "Standard Deviation: 1.4142135623730951\n",
+      "Mean: 3.0\n",
+      "Standard Deviation: 1.5811388300841898\n",
+      "Mean: 3.0\n",
+      "Standard Deviation: 1.4142135623730951\n"
+     ]
+    }
+   ],
+   "source": [
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "\n",
+    "\n",
+    "class CustomDataFrame:\n",
+    "    def __init__(self, data):\n",
+    "        self.data = data\n",
+    "\n",
+    "    def mean(self):\n",
+    "        return np.mean(self.data)\n",
+    "\n",
+    "    def std(self):\n",
+    "        return np.std(self.data)\n",
+    "\n",
+    "\n",
+    "def analyze_data(data):\n",
+    "    print(f\"Mean: {data.mean()}\")\n",
+    "    print(f\"Standard Deviation: {data.std()}\")\n",
+    "\n",
+    "\n",
+    "# These all work, thanks to duck typing\n",
+    "numpy_array = np.array([1, 2, 3, 4, 5])\n",
+    "pandas_series = pd.Series([1, 2, 3, 4, 5])\n",
+    "custom_df = CustomDataFrame([1, 2, 3, 4, 5])\n",
+    "\n",
+    "analyze_data(numpy_array)\n",
+    "analyze_data(pandas_series)\n",
+    "analyze_data(custom_df)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "96171599",
+   "metadata": {},
+   "source": [
+    "In this example, the `analyze_data` function works with NumPy arrays, Pandas Series, and our custom `CustomDataFrame` class, because they all have `mean` and `std` methods. This flexibility is powerful in data science workflows where you might be working with various data structures.\n",
+    "\n",
+    "This flexibility is valuable in data science because:\n",
+    "\n",
+    "1. It saves time: You don't need separate functions for different data types.\n",
+    "\n",
+    "Bad example:\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "b84455c8",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Mean: 3.0\n",
+      "Standard Deviation: 1.4142135623730951\n",
+      "Mean: 3.0\n",
+      "Standard Deviation: 1.5811388300841898\n",
+      "Mean: 3.0\n",
+      "Standard Deviation: 1.4142135623730951\n"
+     ]
+    }
+   ],
+   "source": [
+    "def analyze_numpy_array(data):\n",
+    "    print(f\"Mean: {np.mean(data)}\")\n",
+    "    print(f\"Standard Deviation: {np.std(data)}\")\n",
+    "\n",
+    "def analyze_pandas_series(data):\n",
+    "    print(f\"Mean: {data.mean()}\")\n",
+    "    print(f\"Standard Deviation: {data.std()}\")\n",
+    "\n",
+    "def analyze_custom_df(data):\n",
+    "    print(f\"Mean: {data.mean()}\")\n",
+    "    print(f\"Standard Deviation: {data.std()}\")\n",
+    "\n",
+    "numpy_array = np.array([1, 2, 3, 4, 5])\n",
+    "pandas_series = pd.Series([1, 2, 3, 4, 5])\n",
+    "custom_df = CustomDataFrame([1, 2, 3, 4, 5])\n",
+    "\n",
+    "analyze_numpy_array(numpy_array)\n",
+    "analyze_pandas_series(pandas_series)\n",
+    "analyze_custom_df(custom_df)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c569cd67",
+   "metadata": {},
+   "source": [
+    "2. It's cleaner: You avoid lots of `if` statements checking for types."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "e463b7aa",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Mean: 3.0\n",
+      "Standard Deviation: 1.4142135623730951\n",
+      "Mean: 3.0\n",
+      "Standard Deviation: 1.5811388300841898\n",
+      "Mean: 3.0\n",
+      "Standard Deviation: 1.4142135623730951\n"
+     ]
+    }
+   ],
+   "source": [
+    "def analyze_data(data):\n",
+    "    if isinstance(data, np.ndarray):\n",
+    "        mean = np.mean(data)\n",
+    "        std = np.std(data)\n",
+    "    elif isinstance(data, pd.Series):\n",
+    "        mean = data.mean()\n",
+    "        std = data.std()\n",
+    "    elif isinstance(data, CustomDataFrame):\n",
+    "        mean = data.mean()\n",
+    "        std = data.std()\n",
+    "    else:\n",
+    "        raise TypeError(\"Unsupported data type\")\n",
+    "    \n",
+    "    print(f\"Mean: {mean}\")\n",
+    "    print(f\"Standard Deviation: {std}\")\n",
+    "\n",
+    "numpy_array = np.array([1, 2, 3, 4, 5])\n",
+    "pandas_series = pd.Series([1, 2, 3, 4, 5])\n",
+    "custom_df = CustomDataFrame([1, 2, 3, 4, 5])\n",
+    "python_list = [1, 2, 3, 4, 5]\n",
+    "\n",
+    "analyze_data(numpy_array)\n",
+    "analyze_data(pandas_series)\n",
+    "analyze_data(custom_df)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4591a347",
+   "metadata": {},
+   "source": [
+    "3. It's more adaptable: Your code can handle new data types easily.\n",
+    "\n",
+    "Bad example:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "9bc290c8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def analyze_data(data):\n",
+    "    if isinstance(data, np.ndarray):\n",
+    "        print(f\"Mean: {np.mean(data)}\")\n",
+    "        print(f\"Standard Deviation: {np.std(data)}\")\n",
+    "    elif isinstance(data, pd.Series):\n",
+    "        print(f\"Mean: {data.mean()}\")\n",
+    "        print(f\"Standard Deviation: {data.std()}\")\n",
+    "    elif isinstance(data, CustomDataFrame):\n",
+    "        print(f\"Mean: {data.mean()}\")\n",
+    "        print(f\"Standard Deviation: {data.std()}\")\n",
+    "    else:\n",
+    "        raise TypeError(\"Unsupported data type\")\n",
+    "\n",
+    "\n",
+    "# If you introduce a new data type, you have to modify the function\n",
+    "class NewDataType:\n",
+    "    def __init__(self, data):\n",
+    "        self.data = data\n",
+    "    def mean(self):\n",
+    "        return sum(self.data) / len(self.data)\n",
+    "    def std(self):\n",
+    "        mean = self.mean()\n",
+    "        return (sum((x - mean) ** 2 for x in self.data) / len(self.data)) ** 0.5\n",
+    "\n",
+    "new_data = NewDataType([1, 2, 3, 4, 5])\n",
+    "# This will raise a TypeError\n",
+    "# analyze_data(new_data)"
+   ]
   }
  ],
  "metadata": {