Skip to content

Commit 41fb788

Browse files
add autogluon
1 parent 57ff1e4 commit 41fb788

File tree

4 files changed

+224
-3
lines changed

4 files changed

+224
-3
lines changed

Chapter5/machine_learning.ipynb

Lines changed: 85 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2480,6 +2480,90 @@
24802480
"source": [
24812481
"[Link to Lazy Predict](https://github.com/shankarpandala/lazypredict)."
24822482
]
2483+
},
2484+
{
2485+
"cell_type": "markdown",
2486+
"id": "ee14002f",
2487+
"metadata": {},
2488+
"source": [
2489+
"### AutoGluon: Fast and Accurate ML in 3 Lines of Code"
2490+
]
2491+
},
2492+
{
2493+
"cell_type": "markdown",
2494+
"id": "35aec697",
2495+
"metadata": {},
2496+
"source": [
2497+
"The traditional scikit-learn approach requires extensive manual work, including data preprocessing, model selection, and hyperparameter tuning.\n",
2498+
"\n",
2499+
"In contrast, AutoGluon automates these tasks, allowing you to train and deploy accurate models with minimal code."
2500+
]
2501+
},
2502+
{
2503+
"cell_type": "markdown",
2504+
"id": "28784b94",
2505+
"metadata": {},
2506+
"source": [
2507+
"```python\n",
2508+
"from sklearn.impute import SimpleImputer\n",
2509+
"from sklearn.preprocessing import OneHotEncoder, StandardScaler\n",
2510+
"from sklearn.compose import ColumnTransformer\n",
2511+
"from sklearn.ensemble import RandomForestClassifier\n",
2512+
"from sklearn.pipeline import Pipeline\n",
2513+
"from sklearn.model_selection import GridSearchCV\n",
2514+
"\n",
2515+
"# Preprocessing Pipeline\n",
2516+
"numeric_transformer = SimpleImputer(strategy='mean')\n",
2517+
"categorical_transformer = OneHotEncoder(handle_unknown='ignore')\n",
2518+
"\n",
2519+
"preprocessor = ColumnTransformer(\n",
2520+
" transformers=[\n",
2521+
" ('num', numeric_transformer, numerical_columns),\n",
2522+
" ('cat', categorical_transformer, categorical_columns)\n",
2523+
" ])\n",
2524+
"\n",
2525+
"# Machine Learning Pipeline\n",
2526+
"model = RandomForestClassifier()\n",
2527+
"\n",
2528+
"pipeline = Pipeline(steps=[\n",
2529+
" ('preprocessor', preprocessor),\n",
2530+
" ('scaler', StandardScaler()),\n",
2531+
" ('model', model)\n",
2532+
"])\n",
2533+
"\n",
2534+
"# Hyperparameter Tuning\n",
2535+
"param_grid = {\n",
2536+
" 'model__n_estimators': [100, 200, 300],\n",
2537+
" 'model__max_depth': [5, 10, None],\n",
2538+
" 'model__min_samples_split': [2, 5, 10]\n",
2539+
"}\n",
2540+
"\n",
2541+
"grid_search = GridSearchCV(pipeline, param_grid, cv=5, scoring='accuracy')\n",
2542+
"grid_search.fit(X_train, y_train)\n",
2543+
"grid_search.predict(X_test)\n",
2544+
"```"
2545+
]
2546+
},
2547+
{
2548+
"cell_type": "markdown",
2549+
"id": "25742686",
2550+
"metadata": {},
2551+
"source": [
2552+
"```python\n",
2553+
"from autogluon.tabular import TabularPredictor\n",
2554+
"\n",
2555+
"predictor = TabularPredictor(label=\"class\").fit(train_data)\n",
2556+
"predictions = predictor.predict(test_data)\n",
2557+
"```"
2558+
]
2559+
},
2560+
{
2561+
"cell_type": "markdown",
2562+
"id": "9e51ccf5",
2563+
"metadata": {},
2564+
"source": [
2565+
"[Link to AutoGluon](https://bit.ly/45ljoOd)."
2566+
]
24832567
}
24842568
],
24852569
"metadata": {
@@ -2500,7 +2584,7 @@
25002584
"name": "python",
25012585
"nbconvert_exporter": "python",
25022586
"pygments_lexer": "ipython3",
2503-
"version": "3.11.6"
2587+
"version": "3.11.4"
25042588
},
25052589
"toc": {
25062590
"base_numbering": 1,

docs/Chapter5/machine_learning.html

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -234,6 +234,7 @@
234234
<li class="toctree-l2"><a class="reference internal" href="../Chapter2/dataclasses.html">3.7. Data Classes</a></li>
235235
<li class="toctree-l2"><a class="reference internal" href="../Chapter2/typing.html">3.8. Typing</a></li>
236236
<li class="toctree-l2"><a class="reference internal" href="../Chapter2/pathlib.html">3.9. pathlib</a></li>
237+
<li class="toctree-l2"><a class="reference internal" href="../Chapter2/pydantic.html">3.10. Pydantic</a></li>
237238
</ul>
238239
</li>
239240
<li class="toctree-l1 has-children"><a class="reference internal" href="../Chapter3/Chapter3.html">4. Pandas</a><input class="toctree-checkbox" id="toctree-checkbox-4" name="toctree-checkbox-4" type="checkbox"/><label class="toctree-toggle" for="toctree-checkbox-4"><i class="fa-solid fa-chevron-down"></i></label><ul>
@@ -529,6 +530,7 @@ <h2> Contents </h2>
529530
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#mlem-capture-your-machine-learning-model-s-metadata">6.5.15. MLEM: Capture Your Machine Learning Model’s Metadata</a></li>
530531
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#distributed-machine-learning-with-mllib">6.5.16. Distributed Machine Learning with MLlib</a></li>
531532
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#rapid-prototyping-and-comparison-of-basic-models-with-lazy-predict">6.5.17. Rapid Prototyping and Comparison of Basic Models with Lazy Predict</a></li>
533+
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#autogluon-fast-and-accurate-ml-in-3-lines-of-code">6.5.18. AutoGluon: Fast and Accurate ML in 3 Lines of Code</a></li>
532534
</ul>
533535
</nav>
534536
</div>
@@ -1938,6 +1940,56 @@ <h2><span class="section-number">6.5.17. </span>Rapid Prototyping and Comparison
19381940
</div>
19391941
<p><a class="reference external" href="https://github.com/shankarpandala/lazypredict">Link to Lazy Predict</a>.</p>
19401942
</section>
1943+
<section id="autogluon-fast-and-accurate-ml-in-3-lines-of-code">
1944+
<h2><span class="section-number">6.5.18. </span>AutoGluon: Fast and Accurate ML in 3 Lines of Code<a class="headerlink" href="#autogluon-fast-and-accurate-ml-in-3-lines-of-code" title="Permalink to this heading">#</a></h2>
1945+
<p>The traditional scikit-learn approach requires extensive manual work, including data preprocessing, model selection, and hyperparameter tuning.</p>
1946+
<p>In contrast, AutoGluon automates these tasks, allowing you to train and deploy accurate models with minimal code.</p>
1947+
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">sklearn.impute</span> <span class="kn">import</span> <span class="n">SimpleImputer</span>
1948+
<span class="kn">from</span> <span class="nn">sklearn.preprocessing</span> <span class="kn">import</span> <span class="n">OneHotEncoder</span><span class="p">,</span> <span class="n">StandardScaler</span>
1949+
<span class="kn">from</span> <span class="nn">sklearn.compose</span> <span class="kn">import</span> <span class="n">ColumnTransformer</span>
1950+
<span class="kn">from</span> <span class="nn">sklearn.ensemble</span> <span class="kn">import</span> <span class="n">RandomForestClassifier</span>
1951+
<span class="kn">from</span> <span class="nn">sklearn.pipeline</span> <span class="kn">import</span> <span class="n">Pipeline</span>
1952+
<span class="kn">from</span> <span class="nn">sklearn.model_selection</span> <span class="kn">import</span> <span class="n">GridSearchCV</span>
1953+
1954+
<span class="c1"># Preprocessing Pipeline</span>
1955+
<span class="n">numeric_transformer</span> <span class="o">=</span> <span class="n">SimpleImputer</span><span class="p">(</span><span class="n">strategy</span><span class="o">=</span><span class="s1">&#39;mean&#39;</span><span class="p">)</span>
1956+
<span class="n">categorical_transformer</span> <span class="o">=</span> <span class="n">OneHotEncoder</span><span class="p">(</span><span class="n">handle_unknown</span><span class="o">=</span><span class="s1">&#39;ignore&#39;</span><span class="p">)</span>
1957+
1958+
<span class="n">preprocessor</span> <span class="o">=</span> <span class="n">ColumnTransformer</span><span class="p">(</span>
1959+
<span class="n">transformers</span><span class="o">=</span><span class="p">[</span>
1960+
<span class="p">(</span><span class="s1">&#39;num&#39;</span><span class="p">,</span> <span class="n">numeric_transformer</span><span class="p">,</span> <span class="n">numerical_columns</span><span class="p">),</span>
1961+
<span class="p">(</span><span class="s1">&#39;cat&#39;</span><span class="p">,</span> <span class="n">categorical_transformer</span><span class="p">,</span> <span class="n">categorical_columns</span><span class="p">)</span>
1962+
<span class="p">])</span>
1963+
1964+
<span class="c1"># Machine Learning Pipeline</span>
1965+
<span class="n">model</span> <span class="o">=</span> <span class="n">RandomForestClassifier</span><span class="p">()</span>
1966+
1967+
<span class="n">pipeline</span> <span class="o">=</span> <span class="n">Pipeline</span><span class="p">(</span><span class="n">steps</span><span class="o">=</span><span class="p">[</span>
1968+
<span class="p">(</span><span class="s1">&#39;preprocessor&#39;</span><span class="p">,</span> <span class="n">preprocessor</span><span class="p">),</span>
1969+
<span class="p">(</span><span class="s1">&#39;scaler&#39;</span><span class="p">,</span> <span class="n">StandardScaler</span><span class="p">()),</span>
1970+
<span class="p">(</span><span class="s1">&#39;model&#39;</span><span class="p">,</span> <span class="n">model</span><span class="p">)</span>
1971+
<span class="p">])</span>
1972+
1973+
<span class="c1"># Hyperparameter Tuning</span>
1974+
<span class="n">param_grid</span> <span class="o">=</span> <span class="p">{</span>
1975+
<span class="s1">&#39;model__n_estimators&#39;</span><span class="p">:</span> <span class="p">[</span><span class="mi">100</span><span class="p">,</span> <span class="mi">200</span><span class="p">,</span> <span class="mi">300</span><span class="p">],</span>
1976+
<span class="s1">&#39;model__max_depth&#39;</span><span class="p">:</span> <span class="p">[</span><span class="mi">5</span><span class="p">,</span> <span class="mi">10</span><span class="p">,</span> <span class="kc">None</span><span class="p">],</span>
1977+
<span class="s1">&#39;model__min_samples_split&#39;</span><span class="p">:</span> <span class="p">[</span><span class="mi">2</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">10</span><span class="p">]</span>
1978+
<span class="p">}</span>
1979+
1980+
<span class="n">grid_search</span> <span class="o">=</span> <span class="n">GridSearchCV</span><span class="p">(</span><span class="n">pipeline</span><span class="p">,</span> <span class="n">param_grid</span><span class="p">,</span> <span class="n">cv</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">scoring</span><span class="o">=</span><span class="s1">&#39;accuracy&#39;</span><span class="p">)</span>
1981+
<span class="n">grid_search</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">X_train</span><span class="p">,</span> <span class="n">y_train</span><span class="p">)</span>
1982+
<span class="n">grid_search</span><span class="o">.</span><span class="n">predict</span><span class="p">(</span><span class="n">X_test</span><span class="p">)</span>
1983+
</pre></div>
1984+
</div>
1985+
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">autogluon.tabular</span> <span class="kn">import</span> <span class="n">TabularPredictor</span>
1986+
1987+
<span class="n">predictor</span> <span class="o">=</span> <span class="n">TabularPredictor</span><span class="p">(</span><span class="n">label</span><span class="o">=</span><span class="s2">&quot;class&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">train_data</span><span class="p">)</span>
1988+
<span class="n">predictions</span> <span class="o">=</span> <span class="n">predictor</span><span class="o">.</span><span class="n">predict</span><span class="p">(</span><span class="n">test_data</span><span class="p">)</span>
1989+
</pre></div>
1990+
</div>
1991+
<p><a class="reference external" href="https://bit.ly/45ljoOd">Link to AutoGluon</a>.</p>
1992+
</section>
19411993
</section>
19421994

19431995
<script type="text/x-thebe-config">
@@ -2020,6 +2072,7 @@ <h2><span class="section-number">6.5.17. </span>Rapid Prototyping and Comparison
20202072
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#mlem-capture-your-machine-learning-model-s-metadata">6.5.15. MLEM: Capture Your Machine Learning Model’s Metadata</a></li>
20212073
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#distributed-machine-learning-with-mllib">6.5.16. Distributed Machine Learning with MLlib</a></li>
20222074
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#rapid-prototyping-and-comparison-of-basic-models-with-lazy-predict">6.5.17. Rapid Prototyping and Comparison of Basic Models with Lazy Predict</a></li>
2075+
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#autogluon-fast-and-accurate-ml-in-3-lines-of-code">6.5.18. AutoGluon: Fast and Accurate ML in 3 Lines of Code</a></li>
20232076
</ul>
20242077
</nav></div>
20252078

docs/_sources/Chapter5/machine_learning.ipynb

Lines changed: 85 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2480,6 +2480,90 @@
24802480
"source": [
24812481
"[Link to Lazy Predict](https://github.com/shankarpandala/lazypredict)."
24822482
]
2483+
},
2484+
{
2485+
"cell_type": "markdown",
2486+
"id": "ee14002f",
2487+
"metadata": {},
2488+
"source": [
2489+
"### AutoGluon: Fast and Accurate ML in 3 Lines of Code"
2490+
]
2491+
},
2492+
{
2493+
"cell_type": "markdown",
2494+
"id": "35aec697",
2495+
"metadata": {},
2496+
"source": [
2497+
"The traditional scikit-learn approach requires extensive manual work, including data preprocessing, model selection, and hyperparameter tuning.\n",
2498+
"\n",
2499+
"In contrast, AutoGluon automates these tasks, allowing you to train and deploy accurate models with minimal code."
2500+
]
2501+
},
2502+
{
2503+
"cell_type": "markdown",
2504+
"id": "28784b94",
2505+
"metadata": {},
2506+
"source": [
2507+
"```python\n",
2508+
"from sklearn.impute import SimpleImputer\n",
2509+
"from sklearn.preprocessing import OneHotEncoder, StandardScaler\n",
2510+
"from sklearn.compose import ColumnTransformer\n",
2511+
"from sklearn.ensemble import RandomForestClassifier\n",
2512+
"from sklearn.pipeline import Pipeline\n",
2513+
"from sklearn.model_selection import GridSearchCV\n",
2514+
"\n",
2515+
"# Preprocessing Pipeline\n",
2516+
"numeric_transformer = SimpleImputer(strategy='mean')\n",
2517+
"categorical_transformer = OneHotEncoder(handle_unknown='ignore')\n",
2518+
"\n",
2519+
"preprocessor = ColumnTransformer(\n",
2520+
" transformers=[\n",
2521+
" ('num', numeric_transformer, numerical_columns),\n",
2522+
" ('cat', categorical_transformer, categorical_columns)\n",
2523+
" ])\n",
2524+
"\n",
2525+
"# Machine Learning Pipeline\n",
2526+
"model = RandomForestClassifier()\n",
2527+
"\n",
2528+
"pipeline = Pipeline(steps=[\n",
2529+
" ('preprocessor', preprocessor),\n",
2530+
" ('scaler', StandardScaler()),\n",
2531+
" ('model', model)\n",
2532+
"])\n",
2533+
"\n",
2534+
"# Hyperparameter Tuning\n",
2535+
"param_grid = {\n",
2536+
" 'model__n_estimators': [100, 200, 300],\n",
2537+
" 'model__max_depth': [5, 10, None],\n",
2538+
" 'model__min_samples_split': [2, 5, 10]\n",
2539+
"}\n",
2540+
"\n",
2541+
"grid_search = GridSearchCV(pipeline, param_grid, cv=5, scoring='accuracy')\n",
2542+
"grid_search.fit(X_train, y_train)\n",
2543+
"grid_search.predict(X_test)\n",
2544+
"```"
2545+
]
2546+
},
2547+
{
2548+
"cell_type": "markdown",
2549+
"id": "25742686",
2550+
"metadata": {},
2551+
"source": [
2552+
"```python\n",
2553+
"from autogluon.tabular import TabularPredictor\n",
2554+
"\n",
2555+
"predictor = TabularPredictor(label=\"class\").fit(train_data)\n",
2556+
"predictions = predictor.predict(test_data)\n",
2557+
"```"
2558+
]
2559+
},
2560+
{
2561+
"cell_type": "markdown",
2562+
"id": "9e51ccf5",
2563+
"metadata": {},
2564+
"source": [
2565+
"[Link to AutoGluon](https://bit.ly/45ljoOd)."
2566+
]
24832567
}
24842568
],
24852569
"metadata": {
@@ -2500,7 +2584,7 @@
25002584
"name": "python",
25012585
"nbconvert_exporter": "python",
25022586
"pygments_lexer": "ipython3",
2503-
"version": "3.11.6"
2587+
"version": "3.11.4"
25042588
},
25052589
"toc": {
25062590
"base_numbering": 1,

docs/searchindex.js

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)