Skip to content

Commit 1b6d39d

Browse files
authored
README: Proofreading the texts (#218)
1 parent a049091 commit 1b6d39d

File tree

3 files changed

+54
-50
lines changed

3 files changed

+54
-50
lines changed

CHANGELOG.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,9 @@
11
# Changelog
22

3+
## 16.10.1
4+
- README.md:
5+
- Proofreading the texts
6+
37
## 16.10.0
48
- KDTree:
59
- Added `queryIterable` method

README.md

Lines changed: 49 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55

66
# Machine learning algorithms for Dart developers - ml_algo library
77

8-
The library is a part of ecosystem:
8+
The library is a part of the ecosystem:
99

1010
- [ml_algo library](https://github.com/gyrdym/ml_algo) - implementation of popular machine learning algorithms
1111
- [ml_preprocessing library](https://github.com/gyrdym/ml_preprocessing) - a library for data preprocessing
@@ -19,23 +19,23 @@ The library is a part of ecosystem:
1919
- [Examples](#examples)
2020
- [Logistic regression](#logistic-regression)
2121
- [Linear regression](#linear-regression)
22-
- [Decision tree based classification](#decision-tree-based-classification)
22+
- [Decision tree-based classification](#decision-tree-based-classification)
2323
- [Models retraining](#models-retraining)
24-
- [Notes on gradient based optimisation algorithms](#a-couple-of-words-about-linear-models-which-use-gradient-optimisation-methods)
24+
- [Notes on gradient-based optimisation algorithms](#a-couple-of-words-about-linear-models-which-use-gradient-optimisation-methods)
2525

2626

2727

2828
## What is ml_algo for?
2929

3030
The main purpose of the library is to give native Dart implementation of machine learning algorithms to those who are
3131
interested both in Dart language and data science. This library aims at Dart VM and Flutter, it's impossible to use
32-
it in the web applications.
32+
it in web applications.
3333

3434
## The library's content
3535

3636
- #### Model selection
3737
- [CrossValidator](https://github.com/gyrdym/ml_algo/blob/master/lib/src/model_selection/cross_validator/cross_validator.dart).
38-
Factory that creates instances of cross validators. Cross validation allows researchers to fit different
38+
A factory that creates instances of cross validators. Cross-validation allows researchers to fit different
3939
[hyperparameters](https://en.wikipedia.org/wiki/Hyperparameter_(machine_learning)) of machine learning algorithms
4040
assessing prediction quality on different parts of a dataset.
4141

@@ -52,12 +52,12 @@ it in the web applications.
5252
A class that performs classification using decision trees. May work with data with non-linear patterns.
5353

5454
- [KnnClassifier](https://github.com/gyrdym/ml_algo/blob/master/lib/src/classifier/knn_classifier/knn_classifier.dart)
55-
A class that performs classification using `k nearest neighbours algorithm` - it makes prediction basing on
55+
A class that performs classification using `k nearest neighbours algorithm` - it makes predictions based on
5656
the first `k` closest observations to the given one.
5757

5858
- #### Regression algorithms
5959
- [LinearRegressor](https://github.com/gyrdym/ml_algo/blob/master/lib/src/regressor/linear_regressor/linear_regressor.dart).
60-
A general class for finding a linear pattern in training data and predicting outcome as real numbers.
60+
A general class for finding a linear pattern in training data and predicting outcomes as real numbers.
6161

6262
- [LinearRegressor.lasso](https://github.com/gyrdym/ml_algo/blob/85f1e2f19b946beb2b594a62e0e3c999d1c31608/lib/src/regressor/linear_regressor/linear_regressor.dart#L219)
6363
Implementation of the linear regression algorithm based on coordinate descent with lasso regularisation
@@ -66,28 +66,28 @@ it in the web applications.
6666
Implementation of the linear regression algorithm based on stochastic gradient descent with L2 regularisation
6767

6868
- [KnnRegressor](https://github.com/gyrdym/ml_algo/blob/master/lib/src/regressor/knn_regressor/knn_regressor.dart)
69-
A class that makes prediction for each new observation basing on first `k` closest observations from
70-
training data. It may catch non-linear pattern of the data.
69+
A class that makes predictions for each new observation based on the first `k` closest observations from
70+
training data. It may catch non-linear patterns of the data.
7171

7272
- #### Clustering and retrieval algorithms
7373
- [KDTree](https://github.com/gyrdym/ml_algo/blob/master/lib/src/retrieval/kd_tree/kd_tree.dart)
7474

75-
For more information on the library's API, please visit [API reference](https://pub.dev/documentation/ml_algo/latest/ml_algo/ml_algo-library.html)
75+
For more information on the library's API, please visit the [API reference](https://pub.dev/documentation/ml_algo/latest/ml_algo/ml_algo-library.html)
7676

7777
## Examples
7878

7979
### Logistic regression
8080

81-
Let's classify records from well-known dataset - [Pima Indians Diabets Database](https://www.kaggle.com/uciml/pima-indians-diabetes-database)
81+
Let's classify records from a well-known dataset - [Pima Indians Diabetes Database](https://www.kaggle.com/uciml/pima-indians-diabetes-database)
8282
via [Logistic regressor](https://github.com/gyrdym/ml_algo/blob/master/lib/src/classifier/logistic_regressor/logistic_regressor.dart)
8383

8484
**Important note:**
8585

86-
Please pay attention to problems which classifiers and regressors exposed by the library solve. E.g.
86+
Please pay attention to problems that classifiers and regressors exposed by the library solve. For e.g.,
8787
[Logistic regressor](https://github.com/gyrdym/ml_algo/blob/master/lib/src/classifier/logistic_regressor/logistic_regressor.dart)
88-
solves only **binary classification** problem, and that means that you can't use this classifier with a dataset
89-
with more than two classes, keep that in mind - in order to find out more about regresseors and classifiers, please refer to
90-
the [api documentation](https://pub.dev/documentation/ml_algo/latest/ml_algo/ml_algo-library.html) of the package
88+
solves only **binary classification** problems, and that means that you can't use this classifier with a dataset
89+
with more than two classes, keep that in mind - in order to find out more about regressors and classifiers, please refer to
90+
the [API documentation](https://pub.dev/documentation/ml_algo/latest/ml_algo/ml_algo-library.html) of the package
9191

9292
Import all necessary packages. First, it's needed to ensure if you have `ml_preprocessing` and `ml_dataframe` packages
9393
in your dependencies:
@@ -115,7 +115,7 @@ import 'package:ml_preprocessing/ml_preprocessing.dart';
115115

116116
### Read a dataset's file
117117

118-
Download the dataset from [Pima Indians Diabets Database](https://www.kaggle.com/uciml/pima-indians-diabetes-database).
118+
Download the dataset from [Pima Indians Diabetes Database](https://www.kaggle.com/uciml/pima-indians-diabetes-database).
119119

120120
#### For a desktop application:
121121

@@ -160,7 +160,7 @@ final samples = DataFrame.fromRawCsv(rawCsvContent);
160160

161161
### Prepare datasets for training and testing
162162

163-
Data in this file is represented by 768 records and 8 features. 9th column is a label column, it contains either 0 or 1
163+
Data in this file is represented by 768 records and 8 features. The 9th column is a label column, it contains either 0 or 1
164164
on each row. This column is our target - we should predict a class label for each observation. The column's name is
165165
`class variable (0 or 1)`. Let's store it:
166166

@@ -169,8 +169,8 @@ final targetColumnName = 'class variable (0 or 1)';
169169
````
170170

171171
Now it's the time to prepare data splits. Since we have a smallish dataset (only 768 records), we can't afford to
172-
split the data into just train and test sets and evaluate the model on them, the best approach in our case is Cross
173-
Validation. According to this, let's split the data in the following way using the library's [splitData](https://github.com/gyrdym/ml_algo/blob/master/lib/src/model_selection/split_data.dart)
172+
split the data into just train and test sets and evaluate the model on them, the best approach in our case is Cross-Validation.
173+
According to this, let's split the data in the following way using the library's [splitData](https://github.com/gyrdym/ml_algo/blob/master/lib/src/model_selection/split_data.dart)
174174
function:
175175

176176
```dart
@@ -179,21 +179,21 @@ final validationData = splits[0];
179179
final testData = splits[1];
180180
```
181181

182-
`splitData` accepts `DataFrame` instance as the first argument and ratio list as the second one. Now we have 70% of our
183-
data as a validation set and 30% as a test set for evaluating generalization error.
182+
`splitData` accepts a `DataFrame` instance as the first argument and ratio list as the second one. Now we have 70% of our
183+
data as a validation set and 30% as a test set for evaluating generalization errors.
184184

185185
### Set up a model selection algorithm
186186

187-
Then we may create an instance of `CrossValidator` class to fit [hyperparameters](https://en.wikipedia.org/wiki/Hyperparameter_(machine_learning))
187+
Then we may create an instance of `CrossValidator` class to fit the [hyperparameters](https://en.wikipedia.org/wiki/Hyperparameter_(machine_learning))
188188
of our model. We should pass validation data (our `validationData` variable), and a number of folds into CrossValidator
189189
constructor.
190190

191191
````dart
192192
final validator = CrossValidator.kFold(validationData, numberOfFolds: 5);
193193
````
194194

195-
Let's create a factory for the classifier with desired hyperparameters. We have to decide after the cross validation,
196-
if the selected hyperparametrs are good enough or not:
195+
Let's create a factory for the classifier with desired hyperparameters. We have to decide after the cross-validation
196+
if the selected hyperparameters are good enough or not:
197197

198198
```dart
199199
final createClassifier = (DataFrame samples) =>
@@ -209,13 +209,13 @@ final createClassifier = (DataFrame samples) =>
209209
```
210210

211211
Let's describe our hyperparameters:
212-
- `optimizerType` - type of optimization algorithm that will be used to learn coefficients of our model, this time we
213-
decided to use vanilla gradient ascent algorithm
214-
- `iterationsLimit` - number of learning iterations. Selected optimization algorithm (gradient ascent in our case) will
215-
be run this amount of times
216-
- `learningRateType` - a strategy for learning rate update. In our case the learning rate will decrease after every
212+
- `optimizerType` - a type of optimization algorithm that will be used to learn coefficients of our model, this time we
213+
decided to use a vanilla gradient ascent algorithm
214+
- `iterationsLimit` - number of learning iterations. The selected optimization algorithm (gradient ascent in our case) will
215+
be cyclically run this amount of times
216+
- `learningRateType` - a strategy for learning rate update. In our case, the learning rate will decrease after every
217217
iteration
218-
- `batchSize` - size of data (in rows) that will be used per each iteration. As we have a really small dataset we may use
218+
- `batchSize` - the size of data (in rows) that will be used per each iteration. As we have a really small dataset we may use
219219
full-batch gradient ascent, that's why we used `samples.rows.length` here - the total amount of data.
220220
- `probabilityThreshold` - lower bound for positive label probability
221221

@@ -233,17 +233,17 @@ final createClassifier = (DataFrame samples) =>
233233
This argument activates collecting costs per each optimization iteration, and you can see the cost values right after
234234
the model creation.
235235

236-
### Evaluate performance of the model
236+
### Evaluate the performance of the model
237237

238-
Assume, we chose really good hyperprameters. In order to validate this hypothesis let's use CrossValidator instance
238+
Assume, we chose really good hyperparameters. In order to validate this hypothesis let's use CrossValidator instance
239239
created before:
240240

241241
````dart
242242
final scores = await validator.evaluate(createClassifier, MetricType.accuracy);
243243
````
244244

245245
Since the CrossValidator instance returns a [Vector](https://github.com/gyrdym/ml_linalg/blob/master/lib/vector.dart) of scores as a result of our predictor evaluation, we may choose
246-
any way to reduce all the collected scores to a single number, for instance we may use Vector's `mean` method:
246+
any way to reduce all the collected scores to a single number, for instance, we may use Vector's `mean` method:
247247

248248
```dart
249249
final accuracy = scores.mean();
@@ -260,7 +260,7 @@ We can see something like this:
260260
accuracy on k fold validation: 0.65
261261
````
262262

263-
Let's assess our hyperparameters on test set in order to evaluate the model's generalization error:
263+
Let's assess our hyperparameters on the test set in order to evaluate the model's generalization error:
264264

265265
```dart
266266
final testSplits = splitData(testData, [0.8]);
@@ -284,7 +284,7 @@ print(classifier.costPerIteration);
284284
### Write the model to a json file
285285

286286
Seems, our model has a good generalization ability, and that means we may use it in the future.
287-
To do so we may store the model to a file as JSON:
287+
To do so we may store the model in a file as JSON:
288288

289289
```dart
290290
await classifier.saveAsJson('diabetes_classifier.json');
@@ -313,8 +313,8 @@ print(prediction.rows); // [
313313
// ]
314314
```
315315

316-
Please note that all the hyperparameters that we used to generate the model are persisted as the model's readonly
317-
fields, and we can access it anytime:
316+
Please note that all the hyperparameters that we used to generate the model are persisted as the model's read-only
317+
fields, and we can access them anytime:
318318

319319
```dart
320320
print(classifier.iterationsLimit);
@@ -447,7 +447,7 @@ final samples = DataFrame.fromRawCsv(rawCsvContent, fieldDelimiter: ' ');
447447

448448
### Prepare the dataset for training and testing
449449

450-
Data in this file is represented by 505 records and 13 features. 14th column is a target. Since we use autoheader, the
450+
Data in this file is represented by 505 records and 13 features. The 14th column is a target. Since we use autoheader, the
451451
target's name is autogenerated and it is `col_13`. Let's store it in a variable:
452452

453453
````dart
@@ -469,7 +469,7 @@ final trainData = splits[0];
469469
final testData = splits[1];
470470
```
471471

472-
`splitData` accepts `DataFrame` instance as the first argument and ratio list as the second one. Now we have 80% of our
472+
`splitData` accepts a `DataFrame` instance as the first argument and ratio list as the second one. Now we have 80% of our
473473
data as a train set and 20% as a test set.
474474

475475
Let's train the model:
@@ -478,7 +478,7 @@ Let's train the model:
478478
final model = LinearRegressor(trainData, targetName);
479479
```
480480

481-
By default, `LinearRegressor` uses closed-form solution to train the model. One can also use a different solution type,
481+
By default, `LinearRegressor` uses a closed-form solution to train the model. One can also use a different solution type,
482482
e.g. stochastic gradient descent algorithm:
483483

484484
```dart
@@ -580,15 +580,15 @@ void main() async {
580580
````
581581
</details>
582582

583-
## Decision tree based classification
583+
## Decision tree-based classification
584584

585585
Let's try to classify data from a well-known [Iris](https://www.kaggle.com/datasets/uciml/iris) dataset using a non-linear algorithm - [decision trees](https://en.wikipedia.org/wiki/Decision_tree)
586586

587587
First, you need to download the data and place it in a proper place in your file system. To do so you should follow the
588-
instructions which are given in [Logistic regression](#logistic-regression) section.
588+
instructions which are given in the [Logistic regression](#logistic-regression) section.
589589

590-
After loading the data, it's needed to preprocess it. We should drop `Id` column since the column doesn't make sense.
591-
Also, we need to encode 'Species' column - originally, it contains 3 repeated string labels, to feed it to the classifier
590+
After loading the data, it's needed to preprocess it. We should drop the `Id` column since the column doesn't make sense.
591+
Also, we need to encode the 'Species' column - originally, it contains 3 repeated string labels, to feed it to the classifier
592592
it's needed to convert the labels into numbers:
593593

594594
```dart
@@ -630,28 +630,28 @@ parameters in more detail:
630630

631631
All the parameters serve as stopping criteria for the tree building algorithm.
632632

633-
Now we have a ready to use model. As usual, we can save the model to a JSON-file:
633+
Now we have a ready to use model. As usual, we can save the model to a JSON file:
634634

635635
```dart
636636
await model.saveAsJson('path/to/json/file.json');
637637
```
638638

639-
Unlike other models, in case of decision tree we can visualise the algorithm result - we can save the model as SVG-file:
639+
Unlike other models, in the case of a decision tree, we can visualise the algorithm result - we can save the model as an SVG file:
640640

641641
```dart
642642
await model.saveAsSvg('path/to/svg/file.svg');
643643
```
644644

645-
Once we saved it, we can open the file through any image viewer, e.g. through a web-browser. An example of the
646-
resulting svg-image:
645+
Once we saved it, we can open the file through any image viewer, e.g. through a web browser. An example of the
646+
resulting SVG image:
647647

648648
<p align="center">
649649
<img height="600" src="https://raw.github.com/gyrdym/ml_algo/master/e2e/decision_tree_classifier/iris_tree.svg?sanitize=true">
650650
</p>
651651

652652
## Models retraining
653653

654-
Someday our previously shining model can degrade in terms of prediction accuracy - in this case we can retrain it.
654+
Someday our previously shining model can degrade in terms of prediction accuracy - in this case, we can retrain it.
655655
Retraining means simply re-running the same learning algorithm that was used to generate our current model
656656
keeping the same hyperparameters but using a new data set with the same features:
657657

pubspec.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
name: ml_algo
22
description: Machine learning algorithms, Machine learning models performance evaluation functionality
3-
version: 16.10.0
3+
version: 16.10.1
44
homepage: https://github.com/gyrdym/ml_algo
55

66
environment:

0 commit comments

Comments
 (0)