Skip to content

Commit 74c3a63

Browse files
authored
Tweak quickstart.md (#2536)
* tweak quickstart * avoid confusing line `model(noisy |> gpu) |> cpu` * doc ref Flux.gradient * moving data to GPU * Update docs/src/guide/models/quickstart.md
1 parent 8c3fd33 commit 74c3a63

File tree

1 file changed

+19
-18
lines changed

1 file changed

+19
-18
lines changed

docs/src/guide/models/quickstart.md

Lines changed: 19 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -5,26 +5,25 @@ If you have used neural networks before, then this simple example might be helpf
55
If you haven't, then you might prefer the [Fitting a Straight Line](overview.md) page.
66

77
```julia
8-
# This will prompt if neccessary to install everything, including CUDA.
9-
# For CUDA acceleration, also cuDNN.jl has to be installed in your environment.
10-
using Flux, CUDA, Statistics, ProgressMeter
8+
# Install everything, including CUDA, and load packages:
9+
using Pkg; Pkg.add(["Flux", "CUDA", "cuDNN", "ProgressMeter"])
10+
using Flux, Statistics, ProgressMeter
11+
using CUDA # optional
12+
device = gpu_device() # function to move data and model to the GPU
1113

1214
# Generate some data for the XOR problem: vectors of length 2, as columns of a matrix:
1315
noisy = rand(Float32, 2, 1000) # 2×1000 Matrix{Float32}
1416
truth = [xor(col[1]>0.5, col[2]>0.5) for col in eachcol(noisy)] # 1000-element Vector{Bool}
1517

16-
# Use this object to move data and model to the GPU, if available
17-
device = gpu_device()
18-
1918
# Define our model, a multi-layer perceptron with one hidden layer of size 3:
2019
model = Chain(
2120
Dense(2 => 3, tanh), # activation function inside layer
2221
BatchNorm(3),
23-
Dense(3 => 2)) |> device # move model to GPU, if available
22+
Dense(3 => 2)) |> device # move model to GPU, if one is available
2423

2524
# The model encapsulates parameters, randomly initialised. Its initial output is:
26-
out1 = model(noisy |> device) |> cpu # 2×1000 Matrix{Float32}
27-
probs1 = softmax(out1) # normalise to get probabilities
25+
out1 = model(noisy |> device) # 2×1000 Matrix{Float32}, or CuArray{Float32}
26+
probs1 = softmax(out1) |> cpu # normalise to get probabilities (and move off GPU)
2827

2928
# To train the model, we use batches of 64 samples, and one-hot encoding:
3029
target = Flux.onehotbatch(truth, [true, false]) # 2×1000 OneHotMatrix
@@ -35,8 +34,9 @@ opt_state = Flux.setup(Flux.Adam(0.01), model) # will store optimiser momentum,
3534
# Training loop, using the whole data set 1000 times:
3635
losses = []
3736
@showprogress for epoch in 1:1_000
38-
for (x, y) in loader
39-
x, y = device((x, y))
37+
for xy_cpu in loader
38+
# Unpack batch of data, and move to GPU:
39+
x, y = xy_cpu |> device
4040
loss, grads = Flux.withgradient(model) do m
4141
# Evaluate model and loss inside gradient context:
4242
y_hat = m(x)
@@ -49,9 +49,9 @@ end
4949

5050
opt_state # parameters, momenta and output have all changed
5151

52-
out2 = model(noisy |> device) |> cpu # first row is prob. of true, second row p(false)
53-
probs2 = softmax(out2) # normalise to get probabilities
54-
mean((probs2[1,:] .> 0.5) .== truth) # accuracy 94% so far!
52+
out2 = model(noisy |> device) # first row is prob. of true, second row p(false)
53+
probs2 = softmax(out2) |> cpu # normalise to get probabilities
54+
mean((probs2[1,:] .> 0.5) .== truth) # accuracy 94% so far!
5555
```
5656

5757
![](../../assets/quickstart/oneminute.png)
@@ -96,17 +96,18 @@ Some things to notice in this example are:
9696

9797
* The `do` block creates an anonymous function, as the first argument of `gradient`. Anything executed within this is differentiated.
9898

99-
Instead of calling [`gradient`](@ref Zygote.gradient) and [`update!`](@ref Flux.update!) separately, there is a convenience function [`train!`](@ref Flux.train!). If we didn't want anything extra (like logging the loss), we could replace the training loop with the following:
99+
Instead of calling [`gradient`](@ref Flux.gradient) and [`update!`](@ref Flux.update!) separately, there is a convenience function [`train!`](@ref Flux.train!). If we didn't want anything extra (like logging the loss), we could replace the training loop with the following:
100100

101101
```julia
102102
for epoch in 1:1_000
103-
Flux.train!(model, loader, opt_state) do m, x, y
104-
x, y = device((x, y))
103+
Flux.train!(model, loader |> device, opt_state) do m, x, y
105104
y_hat = m(x)
106105
Flux.logitcrossentropy(y_hat, y)
107106
end
108107
end
109108
```
110109

111-
* In our simple example, we conveniently created the model has a [`Chain`](@ref Flux.Chain) of layers.
110+
* Notice that the full dataset `noisy` lives on the CPU, and is moved to the GPU one batch at a time, by `xy_cpu |> device`. This is generally what you want for large datasets. Calling `loader |> device` similarly modifies the `DataLoader` to move one batch at a time.
111+
112+
* In our simple example, we conveniently created the model has a [`Chain`](@ref Flux.Chain) of layers.
112113
For more complex models, you can define a custom struct `MyModel` containing layers and arrays and implement the call operator `(::MyModel)(x) = ...` to define the forward pass. This is all it is needed for Flux to work. Marking the struct with [`Flux.@layer`](@ref) will add some more functionality, like pretty printing and the ability to mark some internal fields as trainable or not (also see [`trainable`](@ref Optimisers.trainable)).

0 commit comments

Comments
 (0)