CodeCutTech
diff --git a/‎Chapter5/machine_learning.ipynb
Lines changed: 80 additions & 184 deletions b/‎Chapter5/machine_learning.ipynb
Lines changed: 80 additions & 184 deletions
diff --git a/‎Chapter6/logging_debugging.ipynb
Lines changed: 29 additions & 4 deletions b/‎Chapter6/logging_debugging.ipynb
Lines changed: 29 additions & 4 deletions
diff --git a/‎docs/Chapter5/machine_learning.html
Lines changed: 18 additions & 14 deletions b/‎docs/Chapter5/machine_learning.html
Lines changed: 18 additions & 14 deletions
diff --git a/‎docs/Chapter6/logging_debugging.html
Lines changed: 12 additions & 11 deletions b/‎docs/Chapter6/logging_debugging.html
Lines changed: 12 additions & 11 deletions
diff --git a/‎docs/_images/e43620bfb9013971379f44ef7fd72ada3ee7a94c58e698b515b15ee4101cfa2c.png
169 KB b/‎docs/_images/e43620bfb9013971379f44ef7fd72ada3ee7a94c58e698b515b15ee4101cfa2c.png
169 KB
@@ -357,14 +357,22 @@
     "### Simplify Python Logging with Loguru"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "0a74f279-828b-4d57-90ae-e78a8e4a2340",
+   "metadata": {},
+   "source": [
+    "Have you ever found yourself using print() instead of a proper logger due to the hassle of setup?\n",
+    "\n",
+    "With Loguru, you can get started with logging right away. A single import is all you need to begin logging with pre-configured color and format settings.\n"
+   ]
+  },
   {
    "attachments": {},
    "cell_type": "markdown",
    "id": "6f30417a",
    "metadata": {},
    "source": [
-    "Are you struggling with the complexity of configuring a logger object before logging in Python? With Loguru, you can skip this step and use the logger object directly with pre-built color and format settings.\n",
-    "\n",
     "Here is the comparison between the standard Python logging library and Loguru:"
    ]
   },
@@ -1452,7 +1460,24 @@
     },
     {
      "data": {
-      "application/javascript": "\n            setTimeout(function() {\n                var nbb_cell_id = 17;\n                var nbb_unformatted_code = \"from tqdm.notebook import tqdm\\nfrom time import sleep\\n\\n\\ndef lower(word):\\n    sleep(1)\\n    print(f\\\"Processing {word}\\\")\\n    return word.lower()\\n\\n\\nwords = tqdm([\\\"Duck\\\", \\\"dog\\\", \\\"Flower\\\", \\\"fan\\\"])\\n\\n[lower(word) for word in words]\";\n                var nbb_formatted_code = \"from tqdm.notebook import tqdm\\nfrom time import sleep\\n\\n\\ndef lower(word):\\n    sleep(1)\\n    print(f\\\"Processing {word}\\\")\\n    return word.lower()\\n\\n\\nwords = tqdm([\\\"Duck\\\", \\\"dog\\\", \\\"Flower\\\", \\\"fan\\\"])\\n\\n[lower(word) for word in words]\";\n                var nbb_cells = Jupyter.notebook.get_cells();\n                for (var i = 0; i < nbb_cells.length; ++i) {\n                    if (nbb_cells[i].input_prompt_number == nbb_cell_id) {\n                        if (nbb_cells[i].get_text() == nbb_unformatted_code) {\n                             nbb_cells[i].set_text(nbb_formatted_code);\n                        }\n                        break;\n                    }\n                }\n            }, 500);\n            ",
+      "application/javascript": [
+       "\n",
+       "            setTimeout(function() {\n",
+       "                var nbb_cell_id = 17;\n",
+       "                var nbb_unformatted_code = \"from tqdm.notebook import tqdm\\nfrom time import sleep\\n\\n\\ndef lower(word):\\n    sleep(1)\\n    print(f\\\"Processing {word}\\\")\\n    return word.lower()\\n\\n\\nwords = tqdm([\\\"Duck\\\", \\\"dog\\\", \\\"Flower\\\", \\\"fan\\\"])\\n\\n[lower(word) for word in words]\";\n",
+       "                var nbb_formatted_code = \"from tqdm.notebook import tqdm\\nfrom time import sleep\\n\\n\\ndef lower(word):\\n    sleep(1)\\n    print(f\\\"Processing {word}\\\")\\n    return word.lower()\\n\\n\\nwords = tqdm([\\\"Duck\\\", \\\"dog\\\", \\\"Flower\\\", \\\"fan\\\"])\\n\\n[lower(word) for word in words]\";\n",
+       "                var nbb_cells = Jupyter.notebook.get_cells();\n",
+       "                for (var i = 0; i < nbb_cells.length; ++i) {\n",
+       "                    if (nbb_cells[i].input_prompt_number == nbb_cell_id) {\n",
+       "                        if (nbb_cells[i].get_text() == nbb_unformatted_code) {\n",
+       "                             nbb_cells[i].set_text(nbb_formatted_code);\n",
+       "                        }\n",
+       "                        break;\n",
+       "                    }\n",
+       "                }\n",
+       "            }, 500);\n",
+       "            "
+      ],
       "text/plain": [
        "<IPython.core.display.Javascript object>"
       ]
@@ -1701,7 +1726,7 @@
   "celltoolbar": "Tags",
   "hide_input": false,
   "kernelspec": {
-   "display_name": "venv",
+   "display_name": "Python 3 (ipykernel)",
    "language": "python",
    "name": "python3"
   },
 
@@ -1227,28 +1227,25 @@ <h2><span class="section-number">6.5.10. </span>imbalanced-learn: Deal with an I
 <span class="expanded">Hide code cell content</span>
 </summary>
 <div class="cell_input docutils container">
-<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="o">!</span>pip<span class="w"> </span>install<span class="w"> </span>imbalanced-learn<span class="o">==</span><span class="m">0</span>.10.0<span class="w"> </span><span class="nv">mlxtend</span><span class="o">==</span><span class="m">0</span>.21.0
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="o">!</span>pip<span class="w"> </span>install<span class="w"> </span>imbalanced-learn<span class="o">==</span><span class="m">0</span>.10.0<span class="w"> </span><span class="nv">mlxtend</span><span class="o">==</span><span class="m">0</span>.21.0<span class="w"> </span>scikit-learn<span class="o">==</span><span class="m">1</span>.2.2
 </pre></div>
 </div>
 </div>
 </details>
 </div>
-<p>To address issues with imbalanced datasets, where one class significantly outweighs others, you can use the <code class="docutils literal notranslate"><span class="pre">imbalanced-learn</span></code> library to generate additional samples for under-represented classes.</p>
-<p>Here’s how you can use the <code class="docutils literal notranslate"><span class="pre">RandomOverSampler</span></code> from <code class="docutils literal notranslate"><span class="pre">imbalanced-learn</span></code> to create a balanced dataset by oversampling the minority class:</p>
+<p>In machine learning, imbalanced datasets can lead to biased models that perform poorly on minority classes. This is particularly problematic in critical applications like fraud detection or disease diagnosis.</p>
+<p>With imbalanced-learn, you can rebalance your dataset using various sampling techniques that work seamlessly with scikit-learn.</p>
+<p>To demonstrate this, let’s generate a sample dataset with 5000 samples, 2 features, and 4 classes:</p>
 <div class="cell docutils container">
 <div class="cell_input docutils container">
 <div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="c1"># Libraries for plotting</span>
 <span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="nn">plt</span>
-<span class="kn">import</span> <span class="nn">seaborn</span> <span class="k">as</span> <span class="nn">sns</span>
 <span class="kn">from</span> <span class="nn">mlxtend.plotting</span> <span class="kn">import</span> <span class="n">plot_decision_regions</span>
-<span class="kn">import</span> <span class="nn">matplotlib.gridspec</span> <span class="k">as</span> <span class="nn">gridspec</span>
 
 <span class="c1"># Libraries for machine learning</span>
 <span class="kn">from</span> <span class="nn">sklearn.datasets</span> <span class="kn">import</span> <span class="n">make_classification</span>
 <span class="kn">from</span> <span class="nn">sklearn.svm</span> <span class="kn">import</span> <span class="n">LinearSVC</span>
-<span class="kn">import</span> <span class="nn">warnings</span>
-
-<span class="n">warnings</span><span class="o">.</span><span class="n">simplefilter</span><span class="p">(</span><span class="s2">&quot;ignore&quot;</span><span class="p">,</span> <span class="ne">UserWarning</span><span class="p">)</span>
+<span class="kn">from</span> <span class="nn">imblearn.over_sampling</span> <span class="kn">import</span> <span class="n">RandomOverSampler</span>
 </pre></div>
 </div>
 </div>
@@ -1271,17 +1268,16 @@ <h2><span class="section-number">6.5.10. </span>imbalanced-learn: Deal with an I
 </div>
 </div>
 </div>
+<p>Resample the dataset using the <code class="docutils literal notranslate"><span class="pre">RandomOverSampler</span></code> class from imbalanced-learn to balance the class distribution. This technique works by duplicating minority samples until they match the majority class.</p>
 <div class="cell docutils container">
 <div class="cell_input docutils container">
-<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">imblearn.over_sampling</span> <span class="kn">import</span> <span class="n">RandomOverSampler</span>
-
-
-<span class="n">ros</span> <span class="o">=</span> <span class="n">RandomOverSampler</span><span class="p">(</span><span class="n">random_state</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">ros</span> <span class="o">=</span> <span class="n">RandomOverSampler</span><span class="p">(</span><span class="n">random_state</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
 <span class="n">X_resampled</span><span class="p">,</span> <span class="n">y_resampled</span> <span class="o">=</span> <span class="n">ros</span><span class="o">.</span><span class="n">fit_resample</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span>
 </pre></div>
 </div>
 </div>
 </div>
+<p>Plot the decision regions of the dataset before and after resampling using a LinearSVC classifier:</p>
 <div class="cell docutils container">
 <div class="cell_input docutils container">
 <div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="c1"># Plotting Decision Regions</span>
@@ -1295,15 +1291,23 @@ <h2><span class="section-number">6.5.10. </span>imbalanced-learn: Deal with an I
 <span class="p">):</span>
     <span class="n">clf</span> <span class="o">=</span> <span class="n">LinearSVC</span><span class="p">()</span>
     <span class="n">clf</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">Xi</span><span class="p">,</span> <span class="n">yi</span><span class="p">)</span>
-    <span class="n">fig</span> <span class="o">=</span> <span class="n">plot_decision_regions</span><span class="p">(</span><span class="n">X</span><span class="o">=</span><span class="n">Xi</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="n">yi</span><span class="p">,</span> <span class="n">clf</span><span class="o">=</span><span class="n">clf</span><span class="p">,</span> <span class="n">legend</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">ax</span><span class="o">=</span><span class="n">ax</span><span class="p">,</span> <span class="n">colors</span><span class="o">=</span><span class="s1">&#39;#A3D9B1,#06B1CF,#F8D347,#E48789&#39;</span><span class="p">)</span>
+    <span class="n">fig</span> <span class="o">=</span> <span class="n">plot_decision_regions</span><span class="p">(</span><span class="n">X</span><span class="o">=</span><span class="n">Xi</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="n">yi</span><span class="p">,</span> <span class="n">clf</span><span class="o">=</span><span class="n">clf</span><span class="p">,</span> <span class="n">legend</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">ax</span><span class="o">=</span><span class="n">ax</span><span class="p">,</span> <span class="n">colors</span><span class="o">=</span><span class="s1">&#39;#E583B6,#72FCDB,#72BEFA,#FFFF99&#39;</span><span class="p">)</span>
     <span class="n">plt</span><span class="o">.</span><span class="n">title</span><span class="p">(</span><span class="n">title</span><span class="p">)</span>
+    <span class="n">ax</span><span class="o">.</span><span class="n">set_title</span><span class="p">(</span><span class="n">title</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s1">&#39;#000000&#39;</span><span class="p">)</span>
 </pre></div>
 </div>
 </div>
 <div class="cell_output docutils container">
-<img alt="../_images/63d1f487cb4b70c31a65541021dd7a53a6786273a8d698f5e3c373f190e97dc6.png" src="../_images/63d1f487cb4b70c31a65541021dd7a53a6786273a8d698f5e3c373f190e97dc6.png" />
+<div class="output stderr highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>/Users/khuyentran/book/venv/lib/python3.11/site-packages/mlxtend/plotting/decision_regions.py:300: UserWarning: You passed a edgecolor/edgecolors (&#39;black&#39;) for an unfilled marker (&#39;x&#39;).  Matplotlib is ignoring the edgecolor in favor of the facecolor.  This behavior may change in the future.
+  ax.scatter(
+/Users/khuyentran/book/venv/lib/python3.11/site-packages/mlxtend/plotting/decision_regions.py:300: UserWarning: You passed a edgecolor/edgecolors (&#39;black&#39;) for an unfilled marker (&#39;x&#39;).  Matplotlib is ignoring the edgecolor in favor of the facecolor.  This behavior may change in the future.
+  ax.scatter(
+</pre></div>
+</div>
+<img alt="../_images/e43620bfb9013971379f44ef7fd72ada3ee7a94c58e698b515b15ee4101cfa2c.png" src="../_images/e43620bfb9013971379f44ef7fd72ada3ee7a94c58e698b515b15ee4101cfa2c.png" />
 </div>
 </div>
+<p>The plot reveals that the resampling process has added more data points to the minority class (green), effectively balancing the class distribution.</p>
 <p><a class="reference external" href="https://github.com/scikit-learn-contrib/imbalanced-learn">Link to imbalanced-learn</a>.</p>
 </section>
 <section id="estimate-prediction-intervals-in-scikit-learn-models-with-mapie">
 
@@ -211,17 +211,17 @@
 <li class="toctree-l3"><a class="reference internal" href="../Chapter1/list/get_elements.html">2.3.1. Get Elements</a></li>
 <li class="toctree-l3"><a class="reference internal" href="../Chapter1/list/unpack_iterables.html">2.3.2. Unpack Iterables</a></li>
 <li class="toctree-l3"><a class="reference internal" href="../Chapter1/list/join_iterable.html">2.3.3. Join Iterables</a></li>
-<li class="toctree-l3"><a class="reference internal" href="../Chapter1/list/interaction_between_2_lists.html">2.3.4. Interaction Between 2 Lists</a></li>
-<li class="toctree-l3"><a class="reference internal" href="../Chapter1/list/apply_functions_to_elements.html">2.3.5. Apply Functions to Elements in a List</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../Chapter1/list/apply_functions_to_elements.html">2.3.4. Apply Functions to Elements in a List</a></li>
 </ul>
 </li>
-<li class="toctree-l2"><a class="reference internal" href="../Chapter1/dictionary.html">2.4. Dictionary</a></li>
-<li class="toctree-l2"><a class="reference internal" href="../Chapter1/function.html">2.5. Function</a></li>
-<li class="toctree-l2"><a class="reference internal" href="../Chapter1/class.html">2.6. Classes</a></li>
-<li class="toctree-l2"><a class="reference internal" href="../Chapter1/datetime.html">2.7. Datetime</a></li>
-<li class="toctree-l2"><a class="reference internal" href="../Chapter1/code_speed.html">2.8. Code Speed</a></li>
-<li class="toctree-l2"><a class="reference internal" href="../Chapter1/good_practices.html">2.9. Good Python Practices</a></li>
-<li class="toctree-l2"><a class="reference internal" href="../Chapter1/python_new_features.html">2.10. New Features in Python</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../Chapter1/set.html">2.4. Set</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../Chapter1/dictionary.html">2.5. Dictionary</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../Chapter1/function.html">2.6. Function</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../Chapter1/class.html">2.7. Classes</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../Chapter1/datetime.html">2.8. Datetime</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../Chapter1/code_speed.html">2.9. Code Speed</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../Chapter1/good_practices.html">2.10. Good Python Practices</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../Chapter1/python_new_features.html">2.11. New Features in Python</a></li>
 </ul>
 </li>
 <li class="toctree-l1 has-children"><a class="reference internal" href="../Chapter2/Chapter2.html">3. Python Utility Libraries</a><input class="toctree-checkbox" id="toctree-checkbox-3" name="toctree-checkbox-3" type="checkbox"/><label class="toctree-toggle" for="toctree-checkbox-3"><i class="fa-solid fa-chevron-down"></i></label><ul>
@@ -271,7 +271,7 @@
 <li class="toctree-l2"><a class="reference internal" href="../Chapter5/better_pandas.html">6.12. Better Pandas</a></li>
 <li class="toctree-l2"><a class="reference internal" href="../Chapter5/testing.html">6.13. Testing</a></li>
 <li class="toctree-l2"><a class="reference internal" href="../Chapter5/SQL.html">6.14. SQL Libraries</a></li>
-<li class="toctree-l2"><a class="reference internal" href="../Chapter5/spark.html">6.15. PySpark</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../Chapter5/spark.html">6.15. 3 Powerful Ways to Create PySpark DataFrames</a></li>
 <li class="toctree-l2"><a class="reference internal" href="../Chapter5/llm.html">6.16. Large Language Model (LLM)</a></li>
 </ul>
 </li>
@@ -712,7 +712,8 @@ <h2><span class="section-number">7.4.2. </span>Rich’s Console: Debug your Pyth
 </section>
 <section id="simplify-python-logging-with-loguru">
 <h2><span class="section-number">7.4.3. </span>Simplify Python Logging with Loguru<a class="headerlink" href="#simplify-python-logging-with-loguru" title="Permalink to this heading">#</a></h2>
-<p>Are you struggling with the complexity of configuring a logger object before logging in Python? With Loguru, you can skip this step and use the logger object directly with pre-built color and format settings.</p>
+<p>Have you ever found yourself using print() instead of a proper logger due to the hassle of setup?</p>
+<p>With Loguru, you can get started with logging right away. A single import is all you need to begin logging with pre-configured color and format settings.</p>
 <p>Here is the comparison between the standard Python logging library and Loguru:</p>
 <p>Standard Python logging library:</p>
 <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="c1"># loguru_vs_logging/logging_example.py</span>