Skip to content

Commit c4837f7

Browse files
authored
Merge pull request #2030 from svilupp/fix-typo-in-docs
Fix typo in docs
2 parents 0b62a91 + a9bc48a commit c4837f7

File tree

6 files changed

+23
-24
lines changed

6 files changed

+23
-24
lines changed

docs/src/gpu.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -97,9 +97,9 @@ Some of the common workflows involving the use of GPUs are presented below.
9797

9898
### Transferring Training Data
9999

100-
In order to train the model using the GPU both model and the training data have to be transferred to GPU memory. This process can be done with the `gpu` function in two different ways:
100+
In order to train the model using the GPU both model and the training data have to be transferred to GPU memory. This process can be done with the `gpu` function in two different ways:
101101

102-
1. Iterating over the batches in a [DataLoader](@ref) object transfering each one of the training batches at a time to the GPU.
102+
1. Iterating over the batches in a [DataLoader](@ref) object transferring each one of the training batches at a time to the GPU.
103103
```julia
104104
train_loader = Flux.DataLoader((xtrain, ytrain), batchsize = 64, shuffle = true)
105105
# ... model, optimizer and loss definitions
@@ -112,14 +112,14 @@ In order to train the model using the GPU both model and the training data have
112112
end
113113
```
114114

115-
2. Transferring all training data to the GPU at once before creating the [DataLoader](@ref) object. This is usually performed for smaller datasets which are sure to fit in the available GPU memory. Some possitilities are:
115+
2. Transferring all training data to the GPU at once before creating the [DataLoader](@ref) object. This is usually performed for smaller datasets which are sure to fit in the available GPU memory. Some possibilities are:
116116
```julia
117117
gpu_train_loader = Flux.DataLoader((xtrain |> gpu, ytrain |> gpu), batchsize = 32)
118118
```
119119
```julia
120120
gpu_train_loader = Flux.DataLoader((xtrain, ytrain) |> gpu, batchsize = 32)
121121
```
122-
Note that both `gpu` and `cpu` are smart enough to recurse through tuples and namedtuples. Other possibility is to use [`MLUtils.mapsobs`](https://juliaml.github.io/MLUtils.jl/dev/api/#MLUtils.mapobs) to push the data movement invocation into the background thread:
122+
Note that both `gpu` and `cpu` are smart enough to recurse through tuples and namedtuples. Another possibility is to use [`MLUtils.mapsobs`](https://juliaml.github.io/MLUtils.jl/dev/api/#MLUtils.mapobs) to push the data movement invocation into the background thread:
123123
```julia
124124
using MLUtils: mapobs
125125
# ...
@@ -159,7 +159,7 @@ let model = cpu(model)
159159
BSON.@save "./path/to/trained_model.bson" model
160160
end
161161

162-
# is equivalente to the above, but uses `key=value` storing directve from BSON.jl
162+
# is equivalent to the above, but uses `key=value` storing directive from BSON.jl
163163
BSON.@save "./path/to/trained_model.bson" model = cpu(model)
164164
```
165165
The reason behind this is that models trained in the GPU but not transferred to the CPU memory scope will expect `CuArray`s as input. In other words, Flux models expect input data coming from the same kind device in which they were trained on.
@@ -181,4 +181,4 @@ $ export CUDA_VISIBLE_DEVICES='0,1'
181181
```
182182

183183

184-
More information for conditional use of GPUs in CUDA.jl can be found in its [documentation](https://cuda.juliagpu.org/stable/installation/conditional/#Conditional-use), and information about the specific use of the variable is described in the [Nvidia CUDA blogpost](https://developer.nvidia.com/blog/cuda-pro-tip-control-gpu-visibility-cuda_visible_devices/).
184+
More information for conditional use of GPUs in CUDA.jl can be found in its [documentation](https://cuda.juliagpu.org/stable/installation/conditional/#Conditional-use), and information about the specific use of the variable is described in the [Nvidia CUDA blog post](https://developer.nvidia.com/blog/cuda-pro-tip-control-gpu-visibility-cuda_visible_devices/).

docs/src/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
Flux is a library for machine learning geared towards high-performance production pipelines. It comes "batteries-included" with many useful tools built in, but also lets you use the full power of the Julia language where you need it. We follow a few key principles:
44

55
* **Doing the obvious thing**. Flux has relatively few explicit APIs for features like regularisation or embeddings. Instead, writing down the mathematical form will work – and be fast.
6-
* **Extensible by default**. Flux is written to be highly extensible and flexible while being performant. Extending Flux is as simple as using your own code as part of the model you want - it is all [high level Julia code](https://github.com/FluxML/Flux.jl/blob/ec16a2c77dbf6ab8b92b0eecd11661be7a62feef/src/layers/recurrent.jl#L131). When in doubt, it’s well worth looking at [the source](https://github.com/FluxML/Flux.jl/). If you need something different, you can easily roll your own.
6+
* **Extensible by default**. Flux is written to be highly extensible and flexible while being performant. Extending Flux is as simple as using your own code as part of the model you want - it is all [high-level Julia code](https://github.com/FluxML/Flux.jl/blob/ec16a2c77dbf6ab8b92b0eecd11661be7a62feef/src/layers/recurrent.jl#L131). When in doubt, it’s well worth looking at [the source](https://github.com/FluxML/Flux.jl/). If you need something different, you can easily roll your own.
77
* **Performance is key**. Flux integrates with high-performance AD tools such as [Zygote.jl](https://github.com/FluxML/Zygote.jl) for generating fast code. Flux optimizes both CPU and GPU performance. Scaling workloads easily to multiple GPUs can be done with the help of Julia's [GPU tooling](https://github.com/JuliaGPU/CUDA.jl) and projects like [DaggerFlux.jl](https://github.com/DhairyaLGandhi/DaggerFlux.jl).
88
* **Play nicely with others**. Flux works well with Julia libraries from [data frames](https://github.com/JuliaComputing/JuliaDB.jl) and [images](https://github.com/JuliaImages/Images.jl) to [differential equation solvers](https://github.com/JuliaDiffEq/DifferentialEquations.jl), so you can easily build complex data processing pipelines that integrate Flux models.
99

docs/src/training/optimisers.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,7 @@ AdaBelief
7171

7272
Flux's optimisers are built around a `struct` that holds all the optimiser parameters along with a definition of how to apply the update rule associated with it. We do this via the `apply!` function which takes the optimiser as the first argument followed by the parameter and its corresponding gradient.
7373

74-
In this manner Flux also allows one to create custom optimisers to be used seamlessly. Let's work this with a simple example.
74+
In this manner Flux also allows one to create custom optimisers to be used seamlessly. Let's work on this with a simple example.
7575

7676
```julia
7777
mutable struct Momentum
@@ -135,7 +135,7 @@ end
135135
loss(rand(10)) # around 0.9
136136
```
137137

138-
In this manner it is possible to compose optimisers for some added flexibility.
138+
It is possible to compose optimisers for some added flexibility.
139139

140140
```@docs
141141
Flux.Optimise.Optimiser
@@ -145,7 +145,7 @@ Flux.Optimise.Optimiser
145145

146146
In practice, it is fairly common to schedule the learning rate of an optimiser to obtain faster convergence. There are a variety of popular scheduling policies, and you can find implementations of them in [ParameterSchedulers.jl](https://darsnack.github.io/ParameterSchedulers.jl/dev/README.html). The documentation for ParameterSchedulers.jl provides a more detailed overview of the different scheduling policies, and how to use them with Flux optimizers. Below, we provide a brief snippet illustrating a [cosine annealing](https://arxiv.org/pdf/1608.03983.pdf) schedule with a momentum optimiser.
147147

148-
First, we import ParameterSchedulers.jl and initalize a cosine annealing schedule to varying the learning rate between `1e-4` and `1e-2` every 10 steps. We also create a new [`Momentum`](@ref) optimiser.
148+
First, we import ParameterSchedulers.jl and initialize a cosine annealing schedule to vary the learning rate between `1e-4` and `1e-2` every 10 steps. We also create a new [`Momentum`](@ref) optimiser.
149149
```julia
150150
using ParameterSchedulers
151151

docs/src/training/training.md

Lines changed: 10 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ To actually train a model we need four things:
88
* An [optimiser](optimisers.md) that will update the model parameters appropriately.
99

1010
Training a model is typically an iterative process, where we go over the data set,
11-
calculate the objective function over the datapoints, and optimise that.
11+
calculate the objective function over the data points, and optimise that.
1212
This can be visualised in the form of a simple loop.
1313

1414
```julia
@@ -41,7 +41,7 @@ more information can be found on [Custom Training Loops](../models/advanced.md).
4141
## Loss Functions
4242

4343
The objective function must return a number representing how far the model is from its target – the *loss* of the model. The `loss` function that we defined in [basics](../models/basics.md) will work as an objective.
44-
In addition to custom losses, model can be trained in conjuction with
44+
In addition to custom losses, a model can be trained in conjunction with
4545
the commonly used losses that are grouped under the `Flux.Losses` module.
4646
We can also define an objective in terms of some model:
4747

@@ -57,18 +57,18 @@ ps = Flux.params(m)
5757
Flux.train!(loss, ps, data, opt)
5858
```
5959

60-
The objective will almost always be defined in terms of some *cost function* that measures the distance of the prediction `m(x)` from the target `y`. Flux has several of these built in, like `mse` for mean squared error or `crossentropy` for cross entropy loss, but you can calculate it however you want.
60+
The objective will almost always be defined in terms of some *cost function* that measures the distance of the prediction `m(x)` from the target `y`. Flux has several of these built-in, like `mse` for mean squared error or `crossentropy` for cross-entropy loss, but you can calculate it however you want.
6161
For a list of all built-in loss functions, check out the [losses reference](../models/losses.md).
6262

63-
At first glance it may seem strange that the model that we want to train is not part of the input arguments of `Flux.train!` too. However the target of the optimizer is not the model itself, but the objective function that represents the departure between modelled and observed data. In other words, the model is implicitly defined in the objective function, and there is no need to give it explicitly. Passing the objective function instead of the model and a cost function separately provides more flexibility, and the possibility of optimizing the calculations.
63+
At first glance, it may seem strange that the model that we want to train is not part of the input arguments of `Flux.train!` too. However the target of the optimizer is not the model itself, but the objective function that represents the departure between modelled and observed data. In other words, the model is implicitly defined in the objective function, and there is no need to give it explicitly. Passing the objective function instead of the model and a cost function separately provides more flexibility and the possibility of optimizing the calculations.
6464

6565
## Model parameters
6666

6767
The model to be trained must have a set of tracked parameters that are used to calculate the gradients of the objective function. In the [basics](../models/basics.md) section it is explained how to create models with such parameters. The second argument of the function `Flux.train!` must be an object containing those parameters, which can be obtained from a model `m` as `Flux.params(m)`.
6868

6969
Such an object contains a reference to the model's parameters, not a copy, such that after their training, the model behaves according to their updated values.
7070

71-
Handling all the parameters on a layer by layer basis is explained in the [Layer Helpers](../models/basics.md) section. Also, for freezing model parameters, see the [Advanced Usage Guide](../models/advanced.md).
71+
Handling all the parameters on a layer-by-layer basis is explained in the [Layer Helpers](../models/basics.md) section. For freezing model parameters, see the [Advanced Usage Guide](../models/advanced.md).
7272

7373
```@docs
7474
Flux.params
@@ -93,7 +93,7 @@ using IterTools: ncycle
9393
data = ncycle([(x, y)], 3)
9494
```
9595

96-
It's common to load the `x`s and `y`s separately. In this case you can use `zip`:
96+
It's common to load the `x`s and `y`s separately. Here you can use `zip`:
9797

9898
```julia
9999
xs = [rand(784), rand(784), rand(784)]
@@ -159,8 +159,7 @@ end
159159
## Custom Training loops
160160
161161
The `Flux.train!` function can be very convenient, especially for simple problems.
162-
Its also very flexible with the use of callbacks.
163-
But for some problems its much cleaner to write your own custom training loop.
162+
For some problems, however, it's much cleaner to write your own custom training loop.
164163
An example follows that works similar to the default `Flux.train` but with no callbacks.
165164
You don't need callbacks if you just code the calls to your functions directly into the loop.
166165
E.g. in the places marked with comments.
@@ -179,8 +178,8 @@ function my_custom_train!(loss, ps, data, opt)
179178
end
180179
# Insert whatever code you want here that needs training_loss, e.g. logging.
181180
# logging_callback(training_loss)
182-
# Insert what ever code you want here that needs gradient.
183-
# E.g. logging with TensorBoardLogger.jl as histogram so you can see if it is becoming huge.
181+
# Insert whatever code you want here that needs gradients.
182+
# e.g. logging histograms with TensorBoardLogger.jl to check for exploding gradients.
184183
update!(opt, ps, gs)
185184
# Here you might like to check validation set accuracy, and break out to do early stopping.
186185
end
@@ -202,7 +201,7 @@ function my_custom_train!(loss, ps, data, opt)
202201
# logging_callback(training_loss)
203202
# Apply back() to the correct type of 1.0 to get the gradient of loss.
204203
gs = back(one(train_loss))
205-
# Insert what ever code you want here that needs gradient.
204+
# Insert whatever code you want here that needs gradient.
206205
# E.g. logging with TensorBoardLogger.jl as histogram so you can see if it is becoming huge.
207206
update!(opt, ps, gs)
208207
# Here you might like to check validation set accuracy, and break out to do early stopping.

docs/src/utilities.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -122,7 +122,7 @@ Flux.skip
122122

123123
Flux provides utilities for controlling your training procedure according to some monitored condition and a maximum `patience`. For example, you can use `early_stopping` to stop training when the model is converging or deteriorating, or you can use `plateau` to check if the model is stagnating.
124124

125-
For example, below we create a pseudo-loss function that decreases, bottoms out, then increases. The early stopping trigger will break the loop before the loss increases too much.
125+
For example, below we create a pseudo-loss function that decreases, bottoms out, and then increases. The early stopping trigger will break the loop before the loss increases too much.
126126
```julia
127127
# create a pseudo-loss that decreases for 4 calls, then starts increasing
128128
# we call this like loss()
@@ -143,7 +143,7 @@ es = early_stopping(loss, 2; init_score = 9)
143143
end
144144
```
145145

146-
The keyword argument `distance` of `early_stopping` is a function of the form `distance(best_score, score)`. By default `distance` is `-`, which implies that the monitored metric `f` is expected to be decreasing and mimimized. If you use some increasing metric (e.g. accuracy), you can customize the `distance` function: `(best_score, score) -> score - best_score`.
146+
The keyword argument `distance` of `early_stopping` is a function of the form `distance(best_score, score)`. By default `distance` is `-`, which implies that the monitored metric `f` is expected to be decreasing and minimized. If you use some increasing metric (e.g. accuracy), you can customize the `distance` function: `(best_score, score) -> score - best_score`.
147147
```julia
148148
# create a pseudo-accuracy that increases by 0.01 each time from 0 to 1
149149
# we call this like acc()

src/optimise/train.jl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,7 @@ Here `pars` is produced by calling [`Flux.params`](@ref) on your model.
8787
(Or just on the layers you want to train, like `train!(loss, params(model[1:end-2]), data, opt)`.)
8888
This is the "implicit" style of parameter handling.
8989
90-
Then, this gradient is used by optimizer `opt` to update the paramters:
90+
This gradient is then used by optimizer `opt` to update the parameters:
9191
```
9292
update!(opt, pars, grads)
9393
```

0 commit comments

Comments
 (0)