Skip to content

Commit 812db9b

Browse files
edit imbalanced-learn
1 parent 2112273 commit 812db9b

File tree

8 files changed

+249
-402
lines changed

8 files changed

+249
-402
lines changed

Chapter5/machine_learning.ipynb

Lines changed: 80 additions & 184 deletions
Large diffs are not rendered by default.

Chapter6/logging_debugging.ipynb

Lines changed: 29 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -357,14 +357,22 @@
357357
"### Simplify Python Logging with Loguru"
358358
]
359359
},
360+
{
361+
"cell_type": "markdown",
362+
"id": "0a74f279-828b-4d57-90ae-e78a8e4a2340",
363+
"metadata": {},
364+
"source": [
365+
"Have you ever found yourself using print() instead of a proper logger due to the hassle of setup?\n",
366+
"\n",
367+
"With Loguru, you can get started with logging right away. A single import is all you need to begin logging with pre-configured color and format settings.\n"
368+
]
369+
},
360370
{
361371
"attachments": {},
362372
"cell_type": "markdown",
363373
"id": "6f30417a",
364374
"metadata": {},
365375
"source": [
366-
"Are you struggling with the complexity of configuring a logger object before logging in Python? With Loguru, you can skip this step and use the logger object directly with pre-built color and format settings.\n",
367-
"\n",
368376
"Here is the comparison between the standard Python logging library and Loguru:"
369377
]
370378
},
@@ -1452,7 +1460,24 @@
14521460
},
14531461
{
14541462
"data": {
1455-
"application/javascript": "\n setTimeout(function() {\n var nbb_cell_id = 17;\n var nbb_unformatted_code = \"from tqdm.notebook import tqdm\\nfrom time import sleep\\n\\n\\ndef lower(word):\\n sleep(1)\\n print(f\\\"Processing {word}\\\")\\n return word.lower()\\n\\n\\nwords = tqdm([\\\"Duck\\\", \\\"dog\\\", \\\"Flower\\\", \\\"fan\\\"])\\n\\n[lower(word) for word in words]\";\n var nbb_formatted_code = \"from tqdm.notebook import tqdm\\nfrom time import sleep\\n\\n\\ndef lower(word):\\n sleep(1)\\n print(f\\\"Processing {word}\\\")\\n return word.lower()\\n\\n\\nwords = tqdm([\\\"Duck\\\", \\\"dog\\\", \\\"Flower\\\", \\\"fan\\\"])\\n\\n[lower(word) for word in words]\";\n var nbb_cells = Jupyter.notebook.get_cells();\n for (var i = 0; i < nbb_cells.length; ++i) {\n if (nbb_cells[i].input_prompt_number == nbb_cell_id) {\n if (nbb_cells[i].get_text() == nbb_unformatted_code) {\n nbb_cells[i].set_text(nbb_formatted_code);\n }\n break;\n }\n }\n }, 500);\n ",
1463+
"application/javascript": [
1464+
"\n",
1465+
" setTimeout(function() {\n",
1466+
" var nbb_cell_id = 17;\n",
1467+
" var nbb_unformatted_code = \"from tqdm.notebook import tqdm\\nfrom time import sleep\\n\\n\\ndef lower(word):\\n sleep(1)\\n print(f\\\"Processing {word}\\\")\\n return word.lower()\\n\\n\\nwords = tqdm([\\\"Duck\\\", \\\"dog\\\", \\\"Flower\\\", \\\"fan\\\"])\\n\\n[lower(word) for word in words]\";\n",
1468+
" var nbb_formatted_code = \"from tqdm.notebook import tqdm\\nfrom time import sleep\\n\\n\\ndef lower(word):\\n sleep(1)\\n print(f\\\"Processing {word}\\\")\\n return word.lower()\\n\\n\\nwords = tqdm([\\\"Duck\\\", \\\"dog\\\", \\\"Flower\\\", \\\"fan\\\"])\\n\\n[lower(word) for word in words]\";\n",
1469+
" var nbb_cells = Jupyter.notebook.get_cells();\n",
1470+
" for (var i = 0; i < nbb_cells.length; ++i) {\n",
1471+
" if (nbb_cells[i].input_prompt_number == nbb_cell_id) {\n",
1472+
" if (nbb_cells[i].get_text() == nbb_unformatted_code) {\n",
1473+
" nbb_cells[i].set_text(nbb_formatted_code);\n",
1474+
" }\n",
1475+
" break;\n",
1476+
" }\n",
1477+
" }\n",
1478+
" }, 500);\n",
1479+
" "
1480+
],
14561481
"text/plain": [
14571482
"<IPython.core.display.Javascript object>"
14581483
]
@@ -1701,7 +1726,7 @@
17011726
"celltoolbar": "Tags",
17021727
"hide_input": false,
17031728
"kernelspec": {
1704-
"display_name": "venv",
1729+
"display_name": "Python 3 (ipykernel)",
17051730
"language": "python",
17061731
"name": "python3"
17071732
},

docs/Chapter5/machine_learning.html

Lines changed: 18 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1227,28 +1227,25 @@ <h2><span class="section-number">6.5.10. </span>imbalanced-learn: Deal with an I
12271227
<span class="expanded">Hide code cell content</span>
12281228
</summary>
12291229
<div class="cell_input docutils container">
1230-
<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="o">!</span>pip<span class="w"> </span>install<span class="w"> </span>imbalanced-learn<span class="o">==</span><span class="m">0</span>.10.0<span class="w"> </span><span class="nv">mlxtend</span><span class="o">==</span><span class="m">0</span>.21.0
1230+
<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="o">!</span>pip<span class="w"> </span>install<span class="w"> </span>imbalanced-learn<span class="o">==</span><span class="m">0</span>.10.0<span class="w"> </span><span class="nv">mlxtend</span><span class="o">==</span><span class="m">0</span>.21.0<span class="w"> </span>scikit-learn<span class="o">==</span><span class="m">1</span>.2.2
12311231
</pre></div>
12321232
</div>
12331233
</div>
12341234
</details>
12351235
</div>
1236-
<p>To address issues with imbalanced datasets, where one class significantly outweighs others, you can use the <code class="docutils literal notranslate"><span class="pre">imbalanced-learn</span></code> library to generate additional samples for under-represented classes.</p>
1237-
<p>Here’s how you can use the <code class="docutils literal notranslate"><span class="pre">RandomOverSampler</span></code> from <code class="docutils literal notranslate"><span class="pre">imbalanced-learn</span></code> to create a balanced dataset by oversampling the minority class:</p>
1236+
<p>In machine learning, imbalanced datasets can lead to biased models that perform poorly on minority classes. This is particularly problematic in critical applications like fraud detection or disease diagnosis.</p>
1237+
<p>With imbalanced-learn, you can rebalance your dataset using various sampling techniques that work seamlessly with scikit-learn.</p>
1238+
<p>To demonstrate this, let’s generate a sample dataset with 5000 samples, 2 features, and 4 classes:</p>
12381239
<div class="cell docutils container">
12391240
<div class="cell_input docutils container">
12401241
<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="c1"># Libraries for plotting</span>
12411242
<span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="nn">plt</span>
1242-
<span class="kn">import</span> <span class="nn">seaborn</span> <span class="k">as</span> <span class="nn">sns</span>
12431243
<span class="kn">from</span> <span class="nn">mlxtend.plotting</span> <span class="kn">import</span> <span class="n">plot_decision_regions</span>
1244-
<span class="kn">import</span> <span class="nn">matplotlib.gridspec</span> <span class="k">as</span> <span class="nn">gridspec</span>
12451244

12461245
<span class="c1"># Libraries for machine learning</span>
12471246
<span class="kn">from</span> <span class="nn">sklearn.datasets</span> <span class="kn">import</span> <span class="n">make_classification</span>
12481247
<span class="kn">from</span> <span class="nn">sklearn.svm</span> <span class="kn">import</span> <span class="n">LinearSVC</span>
1249-
<span class="kn">import</span> <span class="nn">warnings</span>
1250-
1251-
<span class="n">warnings</span><span class="o">.</span><span class="n">simplefilter</span><span class="p">(</span><span class="s2">&quot;ignore&quot;</span><span class="p">,</span> <span class="ne">UserWarning</span><span class="p">)</span>
1248+
<span class="kn">from</span> <span class="nn">imblearn.over_sampling</span> <span class="kn">import</span> <span class="n">RandomOverSampler</span>
12521249
</pre></div>
12531250
</div>
12541251
</div>
@@ -1271,17 +1268,16 @@ <h2><span class="section-number">6.5.10. </span>imbalanced-learn: Deal with an I
12711268
</div>
12721269
</div>
12731270
</div>
1271+
<p>Resample the dataset using the <code class="docutils literal notranslate"><span class="pre">RandomOverSampler</span></code> class from imbalanced-learn to balance the class distribution. This technique works by duplicating minority samples until they match the majority class.</p>
12741272
<div class="cell docutils container">
12751273
<div class="cell_input docutils container">
1276-
<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">imblearn.over_sampling</span> <span class="kn">import</span> <span class="n">RandomOverSampler</span>
1277-
1278-
1279-
<span class="n">ros</span> <span class="o">=</span> <span class="n">RandomOverSampler</span><span class="p">(</span><span class="n">random_state</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
1274+
<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">ros</span> <span class="o">=</span> <span class="n">RandomOverSampler</span><span class="p">(</span><span class="n">random_state</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
12801275
<span class="n">X_resampled</span><span class="p">,</span> <span class="n">y_resampled</span> <span class="o">=</span> <span class="n">ros</span><span class="o">.</span><span class="n">fit_resample</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span>
12811276
</pre></div>
12821277
</div>
12831278
</div>
12841279
</div>
1280+
<p>Plot the decision regions of the dataset before and after resampling using a LinearSVC classifier:</p>
12851281
<div class="cell docutils container">
12861282
<div class="cell_input docutils container">
12871283
<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="c1"># Plotting Decision Regions</span>
@@ -1295,15 +1291,23 @@ <h2><span class="section-number">6.5.10. </span>imbalanced-learn: Deal with an I
12951291
<span class="p">):</span>
12961292
<span class="n">clf</span> <span class="o">=</span> <span class="n">LinearSVC</span><span class="p">()</span>
12971293
<span class="n">clf</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">Xi</span><span class="p">,</span> <span class="n">yi</span><span class="p">)</span>
1298-
<span class="n">fig</span> <span class="o">=</span> <span class="n">plot_decision_regions</span><span class="p">(</span><span class="n">X</span><span class="o">=</span><span class="n">Xi</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="n">yi</span><span class="p">,</span> <span class="n">clf</span><span class="o">=</span><span class="n">clf</span><span class="p">,</span> <span class="n">legend</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">ax</span><span class="o">=</span><span class="n">ax</span><span class="p">,</span> <span class="n">colors</span><span class="o">=</span><span class="s1">&#39;#A3D9B1,#06B1CF,#F8D347,#E48789&#39;</span><span class="p">)</span>
1294+
<span class="n">fig</span> <span class="o">=</span> <span class="n">plot_decision_regions</span><span class="p">(</span><span class="n">X</span><span class="o">=</span><span class="n">Xi</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="n">yi</span><span class="p">,</span> <span class="n">clf</span><span class="o">=</span><span class="n">clf</span><span class="p">,</span> <span class="n">legend</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">ax</span><span class="o">=</span><span class="n">ax</span><span class="p">,</span> <span class="n">colors</span><span class="o">=</span><span class="s1">&#39;#E583B6,#72FCDB,#72BEFA,#FFFF99&#39;</span><span class="p">)</span>
12991295
<span class="n">plt</span><span class="o">.</span><span class="n">title</span><span class="p">(</span><span class="n">title</span><span class="p">)</span>
1296+
<span class="n">ax</span><span class="o">.</span><span class="n">set_title</span><span class="p">(</span><span class="n">title</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s1">&#39;#000000&#39;</span><span class="p">)</span>
13001297
</pre></div>
13011298
</div>
13021299
</div>
13031300
<div class="cell_output docutils container">
1304-
<img alt="../_images/63d1f487cb4b70c31a65541021dd7a53a6786273a8d698f5e3c373f190e97dc6.png" src="../_images/63d1f487cb4b70c31a65541021dd7a53a6786273a8d698f5e3c373f190e97dc6.png" />
1301+
<div class="output stderr highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>/Users/khuyentran/book/venv/lib/python3.11/site-packages/mlxtend/plotting/decision_regions.py:300: UserWarning: You passed a edgecolor/edgecolors (&#39;black&#39;) for an unfilled marker (&#39;x&#39;). Matplotlib is ignoring the edgecolor in favor of the facecolor. This behavior may change in the future.
1302+
ax.scatter(
1303+
/Users/khuyentran/book/venv/lib/python3.11/site-packages/mlxtend/plotting/decision_regions.py:300: UserWarning: You passed a edgecolor/edgecolors (&#39;black&#39;) for an unfilled marker (&#39;x&#39;). Matplotlib is ignoring the edgecolor in favor of the facecolor. This behavior may change in the future.
1304+
ax.scatter(
1305+
</pre></div>
1306+
</div>
1307+
<img alt="../_images/e43620bfb9013971379f44ef7fd72ada3ee7a94c58e698b515b15ee4101cfa2c.png" src="../_images/e43620bfb9013971379f44ef7fd72ada3ee7a94c58e698b515b15ee4101cfa2c.png" />
13051308
</div>
13061309
</div>
1310+
<p>The plot reveals that the resampling process has added more data points to the minority class (green), effectively balancing the class distribution.</p>
13071311
<p><a class="reference external" href="https://github.com/scikit-learn-contrib/imbalanced-learn">Link to imbalanced-learn</a>.</p>
13081312
</section>
13091313
<section id="estimate-prediction-intervals-in-scikit-learn-models-with-mapie">

docs/Chapter6/logging_debugging.html

Lines changed: 12 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -211,17 +211,17 @@
211211
<li class="toctree-l3"><a class="reference internal" href="../Chapter1/list/get_elements.html">2.3.1. Get Elements</a></li>
212212
<li class="toctree-l3"><a class="reference internal" href="../Chapter1/list/unpack_iterables.html">2.3.2. Unpack Iterables</a></li>
213213
<li class="toctree-l3"><a class="reference internal" href="../Chapter1/list/join_iterable.html">2.3.3. Join Iterables</a></li>
214-
<li class="toctree-l3"><a class="reference internal" href="../Chapter1/list/interaction_between_2_lists.html">2.3.4. Interaction Between 2 Lists</a></li>
215-
<li class="toctree-l3"><a class="reference internal" href="../Chapter1/list/apply_functions_to_elements.html">2.3.5. Apply Functions to Elements in a List</a></li>
214+
<li class="toctree-l3"><a class="reference internal" href="../Chapter1/list/apply_functions_to_elements.html">2.3.4. Apply Functions to Elements in a List</a></li>
216215
</ul>
217216
</li>
218-
<li class="toctree-l2"><a class="reference internal" href="../Chapter1/dictionary.html">2.4. Dictionary</a></li>
219-
<li class="toctree-l2"><a class="reference internal" href="../Chapter1/function.html">2.5. Function</a></li>
220-
<li class="toctree-l2"><a class="reference internal" href="../Chapter1/class.html">2.6. Classes</a></li>
221-
<li class="toctree-l2"><a class="reference internal" href="../Chapter1/datetime.html">2.7. Datetime</a></li>
222-
<li class="toctree-l2"><a class="reference internal" href="../Chapter1/code_speed.html">2.8. Code Speed</a></li>
223-
<li class="toctree-l2"><a class="reference internal" href="../Chapter1/good_practices.html">2.9. Good Python Practices</a></li>
224-
<li class="toctree-l2"><a class="reference internal" href="../Chapter1/python_new_features.html">2.10. New Features in Python</a></li>
217+
<li class="toctree-l2"><a class="reference internal" href="../Chapter1/set.html">2.4. Set</a></li>
218+
<li class="toctree-l2"><a class="reference internal" href="../Chapter1/dictionary.html">2.5. Dictionary</a></li>
219+
<li class="toctree-l2"><a class="reference internal" href="../Chapter1/function.html">2.6. Function</a></li>
220+
<li class="toctree-l2"><a class="reference internal" href="../Chapter1/class.html">2.7. Classes</a></li>
221+
<li class="toctree-l2"><a class="reference internal" href="../Chapter1/datetime.html">2.8. Datetime</a></li>
222+
<li class="toctree-l2"><a class="reference internal" href="../Chapter1/code_speed.html">2.9. Code Speed</a></li>
223+
<li class="toctree-l2"><a class="reference internal" href="../Chapter1/good_practices.html">2.10. Good Python Practices</a></li>
224+
<li class="toctree-l2"><a class="reference internal" href="../Chapter1/python_new_features.html">2.11. New Features in Python</a></li>
225225
</ul>
226226
</li>
227227
<li class="toctree-l1 has-children"><a class="reference internal" href="../Chapter2/Chapter2.html">3. Python Utility Libraries</a><input class="toctree-checkbox" id="toctree-checkbox-3" name="toctree-checkbox-3" type="checkbox"/><label class="toctree-toggle" for="toctree-checkbox-3"><i class="fa-solid fa-chevron-down"></i></label><ul>
@@ -271,7 +271,7 @@
271271
<li class="toctree-l2"><a class="reference internal" href="../Chapter5/better_pandas.html">6.12. Better Pandas</a></li>
272272
<li class="toctree-l2"><a class="reference internal" href="../Chapter5/testing.html">6.13. Testing</a></li>
273273
<li class="toctree-l2"><a class="reference internal" href="../Chapter5/SQL.html">6.14. SQL Libraries</a></li>
274-
<li class="toctree-l2"><a class="reference internal" href="../Chapter5/spark.html">6.15. PySpark</a></li>
274+
<li class="toctree-l2"><a class="reference internal" href="../Chapter5/spark.html">6.15. 3 Powerful Ways to Create PySpark DataFrames</a></li>
275275
<li class="toctree-l2"><a class="reference internal" href="../Chapter5/llm.html">6.16. Large Language Model (LLM)</a></li>
276276
</ul>
277277
</li>
@@ -712,7 +712,8 @@ <h2><span class="section-number">7.4.2. </span>Rich’s Console: Debug your Pyth
712712
</section>
713713
<section id="simplify-python-logging-with-loguru">
714714
<h2><span class="section-number">7.4.3. </span>Simplify Python Logging with Loguru<a class="headerlink" href="#simplify-python-logging-with-loguru" title="Permalink to this heading">#</a></h2>
715-
<p>Are you struggling with the complexity of configuring a logger object before logging in Python? With Loguru, you can skip this step and use the logger object directly with pre-built color and format settings.</p>
715+
<p>Have you ever found yourself using print() instead of a proper logger due to the hassle of setup?</p>
716+
<p>With Loguru, you can get started with logging right away. A single import is all you need to begin logging with pre-configured color and format settings.</p>
716717
<p>Here is the comparison between the standard Python logging library and Loguru:</p>
717718
<p>Standard Python logging library:</p>
718719
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="c1"># loguru_vs_logging/logging_example.py</span>
Loading

0 commit comments

Comments
 (0)