You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -35,8 +34,9 @@ opt_state = Flux.setup(Flux.Adam(0.01), model) # will store optimiser momentum,
35
34
# Training loop, using the whole data set 1000 times:
36
35
losses = []
37
36
@showprogressfor epoch in1:1_000
38
-
for (x, y) in loader
39
-
x, y =device((x, y))
37
+
for xy_cpu in loader
38
+
# Unpack batch of data, and move to GPU:
39
+
x, y = xy_cpu |> device
40
40
loss, grads = Flux.withgradient(model) do m
41
41
# Evaluate model and loss inside gradient context:
42
42
y_hat =m(x)
@@ -49,9 +49,9 @@ end
49
49
50
50
opt_state # parameters, momenta and output have all changed
51
51
52
-
out2 =model(noisy |> device) |> cpu# first row is prob. of true, second row p(false)
53
-
probs2 =softmax(out2) # normalise to get probabilities
54
-
mean((probs2[1,:] .>0.5) .== truth) # accuracy 94% so far!
52
+
out2 =model(noisy |> device) # first row is prob. of true, second row p(false)
53
+
probs2 =softmax(out2) |> cpu# normalise to get probabilities
54
+
mean((probs2[1,:] .>0.5) .== truth) # accuracy 94% so far!
55
55
```
56
56
57
57

@@ -96,17 +96,18 @@ Some things to notice in this example are:
96
96
97
97
* The `do` block creates an anonymous function, as the first argument of `gradient`. Anything executed within this is differentiated.
98
98
99
-
Instead of calling [`gradient`](@refZygote.gradient) and [`update!`](@ref Flux.update!) separately, there is a convenience function [`train!`](@ref Flux.train!). If we didn't want anything extra (like logging the loss), we could replace the training loop with the following:
99
+
Instead of calling [`gradient`](@refFlux.gradient) and [`update!`](@ref Flux.update!) separately, there is a convenience function [`train!`](@ref Flux.train!). If we didn't want anything extra (like logging the loss), we could replace the training loop with the following:
100
100
101
101
```julia
102
102
for epoch in1:1_000
103
-
Flux.train!(model, loader, opt_state) do m, x, y
104
-
x, y =device((x, y))
103
+
Flux.train!(model, loader |> device, opt_state) do m, x, y
105
104
y_hat =m(x)
106
105
Flux.logitcrossentropy(y_hat, y)
107
106
end
108
107
end
109
108
```
110
109
111
-
* In our simple example, we conveniently created the model has a [`Chain`](@ref Flux.Chain) of layers.
110
+
* Notice that the full dataset `noisy` lives on the CPU, and is moved to the GPU one batch at a time, by `xy_cpu |> device`. This is generally what you want for large datasets. Calling `loader |> device` similarly modifies the `DataLoader` to move one batch at a time.
111
+
112
+
* In our simple example, we conveniently created the model has a [`Chain`](@ref Flux.Chain) of layers.
112
113
For more complex models, you can define a custom struct `MyModel` containing layers and arrays and implement the call operator `(::MyModel)(x) = ...` to define the forward pass. This is all it is needed for Flux to work. Marking the struct with [`Flux.@layer`](@ref) will add some more functionality, like pretty printing and the ability to mark some internal fields as trainable or not (also see [`trainable`](@ref Optimisers.trainable)).
0 commit comments