You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/src/tutorials/gradient_zoo.md
+41-48Lines changed: 41 additions & 48 deletions
Original file line number
Diff line number
Diff line change
@@ -5,6 +5,8 @@ also known as reverse-mode automatic differentiation.
5
5
Given a model, some data, and a loss function, this answers the question
6
6
"what direction, in the space of the model's parameters, reduces the loss fastest?"
7
7
8
+
### `gradient(f, x)` interface
9
+
8
10
Julia's ecosystem has many versions of `gradient(f, x)`, which evaluates `y = f(x)` then retuns `∂y_∂x`. The details of how they do this vary, but the interfece is similar. An incomplete list is (alphabetically):
These all show the same `∂y_∂x` with respect to `x::Vector`. Sometimes, the result is within a tuple or a NamedTuple.
42
+
These all show the same `∂y_∂x` with respect to `x::Vector`. Sometimes, the result is within a tuple or a NamedTuple, containing `y` as well as the gradient.
43
+
44
+
Note that in all cases, only code executed within the call to `gradient` is differentiated. Calculating the objective function before calling `gradient` will not work, as all information about the steps from `x` to `y` has been lost. For example:
45
+
46
+
```julia
47
+
julia> y =sum(sqrt, x) # calculate the forward pass alone
48
+
7.0
49
+
50
+
julia> y isa Float64 # has forgotten about sqrt and sum
51
+
true
52
+
53
+
julia> Zygote.gradient(x -> y, x) # this cannot work, and gives zero
54
+
(nothing,)
55
+
```
56
+
57
+
### `gradient(f, model)` for Flux models
41
58
42
-
However, the parameters of a Flux model are encapsulated inside the various layers. The model is a set of nested structures. And the gradients `∂loss_∂model` which Flux uses are similarly nested objects.
59
+
However, the parameters of a Flux model are encapsulated inside the various layers. The model is a set of nested structures, and the gradients `∂loss_∂model` which Flux uses are similarly nested objects.
43
60
For example, let's set up a simple model & loss:
44
61
45
62
```julia
@@ -83,66 +100,45 @@ Chain(
83
100
While the type returned for `∂loss_∂model` varies, they all have the same nested structure, matching that of the model. This is all that Flux needs.
84
101
85
102
```julia
86
-
julia> grads_z[1].layers[1].weight
103
+
julia> grads_z[1].layers[1].weight# get the weight matrix
87
104
2×3 Matrix{Float64}:
88
105
-0.1817150.00.0
89
106
0.1817150.00.0
90
107
91
-
julia> grad_e.layers[1].weight
108
+
julia> grad_e.layers[1].weight# get the corresponding gradient matrix
92
109
2×3 Matrix{Float64}:
93
110
-0.1817150.00.0
94
111
0.1817150.00.0
95
112
```
96
113
97
114
Here's Flux updating the model using each gradient:
Both Zygote and Tracker were written for Flux, and at present, Flux loads Zygote and exports `Zygote.gradient`, and calls this within `Flux.train!`. But apart from that, there is very little coupling between Flux and the automatic differentiation package.
148
144
@@ -163,24 +159,21 @@ Source-to-source, within Julia.
163
159
* Returns nested NamedTuples and Tuples, and uses `nothing` to mean zero.
164
160
165
161
166
-
### Zygote, implicit mode
167
-
168
-
Flux's default used to be work like this, instead of using deeply nested trees for gradients as above:
162
+
!!! compat "Deprecated: Zygote's implicit mode"
163
+
Flux's default used to be work like this, instead of using deeply nested trees for gradients as above:
164
+
```julia
165
+
julia> ps = Flux.params(model) # dictionary-like object, with global `objectid` refs
166
+
Params([Float32[1.0 3.0 5.0; 2.0 4.0 6.0]])
169
167
170
-
```julia
171
-
julia> ps = Flux.params(model)
172
-
Params([Float32[1.03.05.0; 2.04.06.0]])
173
-
174
-
julia> val, grad = Zygote.withgradient(() ->loss(model), ps)
175
-
(val =0.6067761f0, grad =Grads(...))
176
-
177
-
julia> grad[model.layers[1].weight] # dictionary, indexed by parameter arrays
178
-
2×3 Matrix{Float32}:
179
-
0.00.0-0.181715
180
-
0.00.00.181715
181
-
```
168
+
julia> val, grad = Zygote.withgradient(() -> loss(model), ps)
169
+
(val = 0.6067761f0, grad = Grads(...))
182
170
183
-
The code inside Zygote is much the same -- do not expect large changes in speed, nor any changes in what works and what does not.
171
+
julia> grad[model.layers[1].weight] # another dictionary, indexed by parameter arrays
172
+
2×3 Matrix{Float32}:
173
+
0.0 0.0 -0.181715
174
+
0.0 0.0 0.181715
175
+
```
176
+
The code inside Zygote is much the same -- do not expect large changes in speed, nor any changes in what works and what does not.
0 commit comments