Skip to content

Commit 02b0c79

Browse files
committed
local changes I forgot to commit
1 parent 4b60ce4 commit 02b0c79

File tree

1 file changed

+12
-13
lines changed

1 file changed

+12
-13
lines changed

docs/src/tutorials/gradient_zoo.md

Lines changed: 12 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -100,12 +100,12 @@ Chain(
100100
While the type returned for `∂loss_∂model` varies, they all have the same nested structure, matching that of the model. This is all that Flux needs.
101101

102102
```julia
103-
julia> grads_z[1].layers[1].weight # get the weight matrix
103+
julia> grads_z[1].layers[1].weight # Zygote's gradient for model.layers[1].weight
104104
2×3 Matrix{Float64}:
105105
-0.181715 0.0 0.0
106106
0.181715 0.0 0.0
107107

108-
julia> grad_e.layers[1].weight # get the corresponding gradient matrix
108+
julia> grad_e.layers[1].weight # Enzyme's gradient for the same weight matrix
109109
2×3 Matrix{Float64}:
110110
-0.181715 0.0 0.0
111111
0.181715 0.0 0.0
@@ -114,12 +114,12 @@ julia> grad_e.layers[1].weight # get the corresponding gradient matrix
114114
Here's Flux updating the model using each gradient:
115115

116116
```julia
117-
julia> opt = Flux.setup(Descent(1/3), model)
117+
julia> opt_state = Flux.setup(Descent(1/3), model) # opt_state is trivial here
118118
(layers = ((weight = Leaf(Descent(0.333333), nothing),), ()),)
119119

120120
julia> model_z = deepcopy(model);
121121

122-
julia> Flux.update!(opt, model_z, grads_z[1]);
122+
julia> Flux.update!(opt_state, model_z, grads_z[1]);
123123

124124
julia> model_z.layers[1].weight # updated weight matrix
125125
2×3 Matrix{Float64}:
@@ -128,7 +128,7 @@ julia> model_z.layers[1].weight # updated weight matrix
128128

129129
julia> model_e = deepcopy(model);
130130

131-
julia> Flux.update!(opt, model_e, grad_e)[2][1].weight # same update
131+
julia> Flux.update!(opt_state, model_e, grad_e)[2][1].weight # same update
132132
2×3 Matrix{Float64}:
133133
1.06057 3.0 5.0
134134
1.93943 4.0 6.0
@@ -142,11 +142,11 @@ In this case they are all identical, but there are some caveats, explored below.
142142

143143
Both Zygote and Tracker were written for Flux, and at present, Flux loads Zygote and exports `Zygote.gradient`, and calls this within `Flux.train!`. But apart from that, there is very little coupling between Flux and the automatic differentiation package.
144144

145-
This page has very brief notes on how all these packages compare, as a guide for anyone wanting to experiment with them. We stress "experiment" since Zygote is (at present) by far the best-tested.
145+
This page has very brief notes on how all these packages compare, as a guide for anyone wanting to experiment with them. We stress "experiment" since Zygote is (at present) by far the best-tested. All notes are from February 2024,
146146

147147
### [Zygote.jl](https://github.com/FluxML/Zygote.jl/issues)
148148

149-
Source-to-source, within Julia.
149+
Reverse-mode source-to-source automatic differentiation, written by hooking into Julis's compiler.
150150

151151
* By far the best-tested option for Flux models.
152152

@@ -156,7 +156,7 @@ Source-to-source, within Julia.
156156

157157
* Custom rules via `ZygoteRules.@adjpoint` or better, `ChainRulesCore.rrule`.
158158

159-
* Returns nested NamedTuples and Tuples, and uses `nothing` to mean zero.
159+
* Returns nested NamedTuples and Tuples, and uses `nothing` to mean zero. Does not track shared arrays, hence may return different contributions
160160

161161

162162
!!! compat "Deprecated: Zygote's implicit mode"
@@ -230,7 +230,7 @@ New package which works on the LLVM code which Julia compiles down to.
230230

231231
* Returns another struct of the same type as the model, such as `Chain` above. Non-differentiable objects are left alone, not replaced by a zero.
232232

233-
### Tapir.jl
233+
### [Tapir.jl](https://github.com/withbayes/Tapir.jl)
234234

235235
Another new AD to watch. Many similariries in its approach to Enzyme.jl, but operates all in Julia.
236236

@@ -280,7 +280,6 @@ Forward mode is a different algorithm...
280280
* No support for GPU
281281

282282

283-
284283
<hr/>
285284

286285
## Second-order
@@ -293,7 +292,7 @@ In principle this works but in practice... best start small.
293292

294293
### ForwardDiff over Zygote
295294

296-
Zygote.hessian is like this.
295+
`Zygote.hessian` is like this.
297296

298297
### Enzyme.jl
299298

@@ -307,7 +306,7 @@ Besides AD packages, several packages have been written aiming to provide a unif
307306

308307
### [AbstractDifferentiation.jl](https://github.com/JuliaDiff/AbstractDifferentiation.jl)
309308

310-
The original meta-package?
309+
The original meta-package for calling any of several engines.
311310

312311
### [DifferentiationInterface.jl](https://github.com/gdalle/DifferentiationInterface.jl)
313312

@@ -317,7 +316,7 @@ This year's new attempt to build a simpler one?
317316

318317
Really `rrule_via_ad` is another mechanism, but only for 3 systems.
319318

320-
319+
Sold as an attempt at unification, but its design of extensible `rrule`s turned out to be too closely tied to Zygote/Diffractor style AD, and not a good fit for Enzyme/Tapir which therefore use their own rule systems. Also not a natural fit for Tracker/ReverseDiff/ForwardDiff style of operator overloading AD.
321320

322321

323322

0 commit comments

Comments
 (0)