Skip to content

Commit 08e9db8

Browse files
author
Weixuan Fu
committed
Merge branch 'development'
2 parents af16ad0 + c766c1b commit 08e9db8

34 files changed

+1339
-188
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ Click on the corresponding links to find more information on TPOT usage in the d
5454

5555
### Classification
5656

57-
Below is a minimal working example with the the optical recognition of handwritten digits dataset.
57+
Below is a minimal working example with the optical recognition of handwritten digits dataset.
5858

5959
```python
6060
from tpot import TPOTClassifier

docs/api/index.html

Lines changed: 15 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -147,7 +147,6 @@ <h2 id="classification">Classification</h2>
147147
<strong>disable_update_check</strong>=False,
148148
<strong>log_file</strong>=None
149149
</em>)</pre>
150-
151150
<div align="right"><a href="https://github.com/EpistasisLab/tpot/blob/master/tpot/base.py">source</a></div>
152151

153152
<p>Automated machine learning for supervised classification tasks.</p>
@@ -352,10 +351,13 @@ <h2 id="classification">Classification</h2>
352351
The update checker will tell you when a new version of TPOT has been released.
353352
</blockquote>
354353

355-
<strong>log_file</strong>: io.TextIOWrapper or io.StringIO, optional (defaul: sys.stdout)
354+
<strong>log_file</strong>: file-like class (io.TextIOWrapper or io.StringIO) or string, optional (default: None)
356355
<br /><br />
357356
<blockquote>
358357
Save progress content to a file.
358+
If it is a string for the path and file name of the desired output file,
359+
TPOT will create the file and write log into it.
360+
If it is None, TPOT will output log into sys.stdout
359361
</blockquote>
360362

361363
</td>
@@ -389,7 +391,7 @@ <h2 id="classification">Classification</h2>
389391
</table>
390392

391393
<p><strong>Example</strong></p>
392-
<pre><code class="Python">from tpot import TPOTClassifier
394+
<pre><code class="language-Python">from tpot import TPOTClassifier
393395
from sklearn.datasets import load_digits
394396
from sklearn.model_selection import train_test_split
395397

@@ -402,7 +404,6 @@ <h2 id="classification">Classification</h2>
402404
print(tpot.score(X_test, y_test))
403405
tpot.export('tpot_digits_pipeline.py')
404406
</code></pre>
405-
406407
<p><strong>Functions</strong></p>
407408
<table width="100%">
408409
<tr>
@@ -432,9 +433,8 @@ <h2 id="classification">Classification</h2>
432433
</table>
433434

434435
<p><a name="tpotclassifier-fit"></a></p>
435-
<pre><code class="Python">fit(features, classes, sample_weight=None, groups=None)
436+
<pre><code class="language-Python">fit(features, classes, sample_weight=None, groups=None)
436437
</code></pre>
437-
438438
<div style="padding-left:5%" width="100%">
439439
Run the TPOT optimization process on the given training data.
440440
<br /><br />
@@ -486,9 +486,8 @@ <h2 id="classification">Classification</h2>
486486
</div>
487487

488488
<p><a name="tpotclassifier-predict"></a></p>
489-
<pre><code class="Python">predict(features)
489+
<pre><code class="language-Python">predict(features)
490490
</code></pre>
491-
492491
<div style="padding-left:5%" width="100%">
493492
Use the optimized pipeline to predict the classes for a feature set.
494493
<br /><br />
@@ -515,9 +514,8 @@ <h2 id="classification">Classification</h2>
515514
</div>
516515

517516
<p><a name="tpotclassifier-predict-proba"></a></p>
518-
<pre><code class="Python">predict_proba(features)
517+
<pre><code class="language-Python">predict_proba(features)
519518
</code></pre>
520-
521519
<div style="padding-left:5%" width="100%">
522520
Use the optimized pipeline to estimate the class probabilities for a feature set.
523521
<br /><br />
@@ -546,9 +544,8 @@ <h2 id="classification">Classification</h2>
546544
</div>
547545

548546
<p><a name="tpotclassifier-score"></a></p>
549-
<pre><code class="Python">score(testing_features, testing_classes)
547+
<pre><code class="language-Python">score(testing_features, testing_classes)
550548
</code></pre>
551-
552549
<div style="padding-left:5%" width="100%">
553550
Returns the optimized pipeline's score on the given testing data using the user-specified scoring function.
554551
<br /><br />
@@ -582,9 +579,8 @@ <h2 id="classification">Classification</h2>
582579
</div>
583580

584581
<p><a name="tpotclassifier-export"></a></p>
585-
<pre><code class="Python">export(output_file_name, data_file_path)
582+
<pre><code class="language-Python">export(output_file_name, data_file_path)
586583
</code></pre>
587-
588584
<div style="padding-left:5%" width="100%">
589585
Export the optimized pipeline as Python code.
590586
<br /><br />
@@ -631,7 +627,6 @@ <h2 id="regression">Regression</h2>
631627
<strong>early_stop</strong>=None,
632628
<strong>verbosity</strong>=0,
633629
<strong>disable_update_check</strong>=False</em>)</pre>
634-
635630
<div align="right"><a href="https://github.com/EpistasisLab/tpot/blob/master/tpot/base.py">source</a></div>
636631

637632
<p>Automated machine learning for supervised regression tasks.</p>
@@ -868,7 +863,7 @@ <h2 id="regression">Regression</h2>
868863
</table>
869864

870865
<p><strong>Example</strong></p>
871-
<pre><code class="Python">from tpot import TPOTRegressor
866+
<pre><code class="language-Python">from tpot import TPOTRegressor
872867
from sklearn.datasets import load_boston
873868
from sklearn.model_selection import train_test_split
874869

@@ -881,7 +876,6 @@ <h2 id="regression">Regression</h2>
881876
print(tpot.score(X_test, y_test))
882877
tpot.export('tpot_boston_pipeline.py')
883878
</code></pre>
884-
885879
<p><strong>Functions</strong></p>
886880
<table width="100%">
887881
<tr>
@@ -906,9 +900,8 @@ <h2 id="regression">Regression</h2>
906900
</table>
907901

908902
<p><a name="tpotregressor-fit"></a></p>
909-
<pre><code class="Python">fit(features, target, sample_weight=None, groups=None)
903+
<pre><code class="language-Python">fit(features, target, sample_weight=None, groups=None)
910904
</code></pre>
911-
912905
<div style="padding-left:5%" width="100%">
913906
Run the TPOT optimization process on the given training data.
914907
<br /><br />
@@ -960,9 +953,8 @@ <h2 id="regression">Regression</h2>
960953
</div>
961954

962955
<p><a name="tpotregressor-predict"></a></p>
963-
<pre><code class="Python">predict(features)
956+
<pre><code class="language-Python">predict(features)
964957
</code></pre>
965-
966958
<div style="padding-left:5%" width="100%">
967959
Use the optimized pipeline to predict the target values for a feature set.
968960
<br /><br />
@@ -989,9 +981,8 @@ <h2 id="regression">Regression</h2>
989981
</div>
990982

991983
<p><a name="tpotregressor-score"></a></p>
992-
<pre><code class="Python">score(testing_features, testing_target)
984+
<pre><code class="language-Python">score(testing_features, testing_target)
993985
</code></pre>
994-
995986
<div style="padding-left:5%" width="100%">
996987
Returns the optimized pipeline's score on the given testing data using the user-specified scoring function.
997988
<br /><br />
@@ -1025,9 +1016,8 @@ <h2 id="regression">Regression</h2>
10251016
</div>
10261017

10271018
<p><a name="tpotregressor-export"></a></p>
1028-
<pre><code class="Python">export(output_file_name)
1019+
<pre><code class="language-Python">export(output_file_name)
10291020
</code></pre>
1030-
10311021
<div style="padding-left:5%" width="100%">
10321022
Export the optimized pipeline as Python code.
10331023
<br /><br />

docs/citing/index.html

Lines changed: 3 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -128,7 +128,7 @@ <h1 id="citing-tpot">Citing TPOT</h1>
128128
<p>If you use TPOT in a scientific publication, please consider citing at least one of the following papers:</p>
129129
<p>Trang T. Le, Weixuan Fu and Jason H. Moore (2020). <a href="https://academic.oup.com/bioinformatics/article/36/1/250/5511404">Scaling tree-based automated machine learning to biomedical big data with a feature set selector</a>. <em>Bioinformatics</em>.36(1): 250-256.</p>
130130
<p>BibTeX entry:</p>
131-
<pre><code class="bibtex">@article{le2020scaling,
131+
<pre><code class="language-bibtex">@article{le2020scaling,
132132
title={Scaling tree-based automated machine learning to biomedical big data with a feature set selector},
133133
author={Le, Trang T and Fu, Weixuan and Moore, Jason H},
134134
journal={Bioinformatics},
@@ -139,10 +139,9 @@ <h1 id="citing-tpot">Citing TPOT</h1>
139139
publisher={Oxford University Press}
140140
}
141141
</code></pre>
142-
143142
<p>Randal S. Olson, Ryan J. Urbanowicz, Peter C. Andrews, Nicole A. Lavender, La Creis Kidd, and Jason H. Moore (2016). <a href="http://link.springer.com/chapter/10.1007/978-3-319-31204-0_9">Automating biomedical data science through tree-based pipeline optimization</a>. <em>Applications of Evolutionary Computation</em>, pages 123-137.</p>
144143
<p>BibTeX entry:</p>
145-
<pre><code class="bibtex">@inbook{Olson2016EvoBio,
144+
<pre><code class="language-bibtex">@inbook{Olson2016EvoBio,
146145
author={Olson, Randal S. and Urbanowicz, Ryan J. and Andrews, Peter C. and Lavender, Nicole A. and Kidd, La Creis and Moore, Jason H.},
147146
editor={Squillero, Giovanni and Burelli, Paolo},
148147
chapter={Automating Biomedical Data Science Through Tree-Based Pipeline Optimization},
@@ -155,11 +154,10 @@ <h1 id="citing-tpot">Citing TPOT</h1>
155154
url={http://dx.doi.org/10.1007/978-3-319-31204-0_9}
156155
}
157156
</code></pre>
158-
159157
<p>Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science</p>
160158
<p>Randal S. Olson, Nathan Bartley, Ryan J. Urbanowicz, and Jason H. Moore (2016). <a href="http://dl.acm.org/citation.cfm?id=2908918">Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science</a>. <em>Proceedings of GECCO 2016</em>, pages 485-492.</p>
161159
<p>BibTeX entry:</p>
162-
<pre><code class="bibtex">@inproceedings{OlsonGECCO2016,
160+
<pre><code class="language-bibtex">@inproceedings{OlsonGECCO2016,
163161
author = {Olson, Randal S. and Bartley, Nathan and Urbanowicz, Ryan J. and Moore, Jason H.},
164162
title = {Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science},
165163
booktitle = {Proceedings of the Genetic and Evolutionary Computation Conference 2016},
@@ -176,7 +174,6 @@ <h1 id="citing-tpot">Citing TPOT</h1>
176174
address = {New York, NY, USA},
177175
}
178176
</code></pre>
179-
180177
<p>Alternatively, you can cite the repository directly with the following DOI:</p>
181178
<p><a href="https://zenodo.org/badge/latestdoi/20747/rhiever/tpot">DOI</a></p>
182179

docs/examples/index.html

Lines changed: 21 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -194,14 +194,28 @@ <h1 id="overview">Overview</h1>
194194
<td align="center"><a href="https://archive.ics.uci.edu/ml/datasets/MAGIC+Gamma+Telescope">link</a></td>
195195
<td align="center"><a href="https://github.com/EpistasisLab/tpot/blob/master/tutorials/MAGIC%20Gamma%20Telescope/MAGIC%20Gamma%20Telescope.ipynb">link</a></td>
196196
</tr>
197+
<tr>
198+
<td>cuML Classification Example</td>
199+
<td>random classification problem</td>
200+
<td>classification</td>
201+
<td align="center"><a href="https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html">link</a></td>
202+
<td align="center"><a href="https://github.com/EpistasisLab/tpot/blob/master/tutorials/cuML_Classification_Example.ipynb">link</a></td>
203+
</tr>
204+
<tr>
205+
<td>cuML Regression Example</td>
206+
<td>random regression problem</td>
207+
<td>regression</td>
208+
<td align="center"><a href="https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_regression.html">link</a></td>
209+
<td align="center"><a href="https://github.com/EpistasisLab/tpot/blob/master/tutorials/cuML_Regression_Example.ipynb">link</a></td>
210+
</tr>
197211
</tbody>
198212
</table>
199213
<p><strong>Notes:</strong>
200214
- For details on how the <code>fit()</code>, <code>score()</code> and <code>export()</code> methods work, refer to the <a href="/using/">usage documentation</a>.
201215
- Upon re-running the experiments, your resulting pipelines <em>may</em> differ (to some extent) from the ones demonstrated here.</p>
202216
<h2 id="iris-flower-classification">Iris flower classification</h2>
203217
<p>The following code illustrates how TPOT can be employed for performing a simple <em>classification task</em> over the Iris dataset.</p>
204-
<pre><code class="Python">from tpot import TPOTClassifier
218+
<pre><code class="language-Python">from tpot import TPOTClassifier
205219
from sklearn.datasets import load_iris
206220
from sklearn.model_selection import train_test_split
207221
import numpy as np
@@ -215,9 +229,8 @@ <h2 id="iris-flower-classification">Iris flower classification</h2>
215229
print(tpot.score(X_test, y_test))
216230
tpot.export('tpot_iris_pipeline.py')
217231
</code></pre>
218-
219232
<p>Running this code should discover a pipeline (exported as <code>tpot_iris_pipeline.py</code>) that achieves about 97% test accuracy:</p>
220-
<pre><code class="Python">import numpy as np
233+
<pre><code class="language-Python">import numpy as np
221234
import pandas as pd
222235
from sklearn.model_selection import train_test_split
223236
from sklearn.neighbors import KNeighborsClassifier
@@ -242,10 +255,9 @@ <h2 id="iris-flower-classification">Iris flower classification</h2>
242255
exported_pipeline.fit(training_features, training_target)
243256
results = exported_pipeline.predict(testing_features)
244257
</code></pre>
245-
246258
<h2 id="digits-dataset">Digits dataset</h2>
247259
<p>Below is a minimal working example with the optical recognition of handwritten digits dataset, which is an <em>image classification problem</em>.</p>
248-
<pre><code class="Python">from tpot import TPOTClassifier
260+
<pre><code class="language-Python">from tpot import TPOTClassifier
249261
from sklearn.datasets import load_digits
250262
from sklearn.model_selection import train_test_split
251263

@@ -258,9 +270,8 @@ <h2 id="digits-dataset">Digits dataset</h2>
258270
print(tpot.score(X_test, y_test))
259271
tpot.export('tpot_digits_pipeline.py')
260272
</code></pre>
261-
262273
<p>Running this code should discover a pipeline (exported as <code>tpot_digits_pipeline.py</code>) that achieves about 98% test accuracy:</p>
263-
<pre><code class="Python">import numpy as np
274+
<pre><code class="language-Python">import numpy as np
264275
import pandas as pd
265276
from sklearn.ensemble import RandomForestClassifier
266277
from sklearn.linear_model import LogisticRegression
@@ -288,10 +299,9 @@ <h2 id="digits-dataset">Digits dataset</h2>
288299
exported_pipeline.fit(training_features, training_target)
289300
results = exported_pipeline.predict(testing_features)
290301
</code></pre>
291-
292302
<h2 id="boston-housing-prices-modeling">Boston housing prices modeling</h2>
293303
<p>The following code illustrates how TPOT can be employed for performing a <em>regression task</em> over the Boston housing prices dataset.</p>
294-
<pre><code class="Python">from tpot import TPOTRegressor
304+
<pre><code class="language-Python">from tpot import TPOTRegressor
295305
from sklearn.datasets import load_boston
296306
from sklearn.model_selection import train_test_split
297307

@@ -304,9 +314,8 @@ <h2 id="boston-housing-prices-modeling">Boston housing prices modeling</h2>
304314
print(tpot.score(X_test, y_test))
305315
tpot.export('tpot_boston_pipeline.py')
306316
</code></pre>
307-
308317
<p>Running this code should discover a pipeline (exported as <code>tpot_boston_pipeline.py</code>) that achieves at least 10 mean squared error (MSE) on the test set:</p>
309-
<pre><code class="Python">import numpy as np
318+
<pre><code class="language-Python">import numpy as np
310319
import pandas as pd
311320
from sklearn.ensemble import ExtraTreesRegressor
312321
from sklearn.model_selection import train_test_split
@@ -331,7 +340,6 @@ <h2 id="boston-housing-prices-modeling">Boston housing prices modeling</h2>
331340
exported_pipeline.fit(training_features, training_target)
332341
results = exported_pipeline.predict(testing_features)
333342
</code></pre>
334-
335343
<h2 id="titanic-survival-analysis">Titanic survival analysis</h2>
336344
<p>To see the TPOT applied the Titanic Kaggle dataset, see the Jupyter notebook <a href="https://github.com/EpistasisLab/tpot/blob/master/tutorials/Titanic_Kaggle.ipynb">here</a>. This example shows how to take a messy dataset and preprocess it such that it can be used in scikit-learn and TPOT.</p>
337345
<h2 id="portuguese-bank-marketing">Portuguese Bank Marketing</h2>
@@ -340,7 +348,7 @@ <h2 id="magic-gamma-telescope">MAGIC Gamma Telescope</h2>
340348
<p>The corresponding Jupyter notebook, containing the associated data preprocessing and analysis, can be found <a href="https://github.com/EpistasisLab/tpot/blob/master/tutorials/MAGIC%20Gamma%20Telescope/MAGIC%20Gamma%20Telescope.ipynb">here</a>.</p>
341349
<h2 id="neural-network-classifier-using-tpot-nn">Neural network classifier using TPOT-NN</h2>
342350
<p>By loading the <a href="https://github.com/EpistasisLab/tpot/blob/master/tpot/config/classifier_nn.py">TPOT-NN configuration dictionary</a>, PyTorch estimators will be included for classification. Users can also create their own NN configuration dictionary that includes <code>tpot.builtins.PytorchLRClassifier</code> and/or <code>tpot.builtins.PytorchMLPClassifier</code>, or they can specify them using a template string, as shown in the following example:</p>
343-
<pre><code class="Python">from tpot import TPOTClassifier
351+
<pre><code class="language-Python">from tpot import TPOTClassifier
344352
from sklearn.datasets import make_blobs
345353
from sklearn.model_selection import train_test_split
346354

@@ -353,7 +361,6 @@ <h2 id="neural-network-classifier-using-tpot-nn">Neural network classifier using
353361
print(clf.score(X_test, y_test))
354362
clf.export('tpot_nn_demo_pipeline.py')
355363
</code></pre>
356-
357364
<p>This example is somewhat trivial, but it should result in nearly 100% classification accuracy.</p>
358365

359366
</div>

docs/index.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -204,5 +204,5 @@
204204

205205
<!--
206206
MkDocs version : 1.1.2
207-
Build Date UTC : 2020-07-21 20:34:39.398221+00:00
207+
Build Date UTC : 2020-10-26 14:32:58.841000+00:00
208208
-->

0 commit comments

Comments
 (0)