Mooncake less efficient than Zygote or Enzyme on Flux layers

(This is a follow-up to a slack thread, cc @willtebbutt & @gdalle )

I have been comparing performance of different autodiff backends for training Flux models. I get one order of magnitude worse performance from Mooncake compared to Enzyme and even Zygote in most cases. This may be due to the Fluxperimental Moonduo implementation ([https://github.com/FluxML/Fluxperimental.jl/blob/master/ext/FluxMooncakeExt.jl](https://github.com/FluxML/Fluxperimental.jl/blob/master/ext/FluxMooncakeExt.jl)) rather than something with Mooncake? here is a MWE if useful.

Simple case of two standard feed forward layers. Benchmarks below with Flux v0.15.2, Zygote v0.6.75 (constrained from updating further by Fluxperimental), Enzyme v0.13.30, Mooncake v0.4.83, on Julia 1.10.5.

```
using Flux
using BenchmarkTools

# Create random inputs and targets
const MINIBATCHSIZE = 64
X = rand(Float32, 100, MINIBATCHSIZE)
Y = rand(Float32, 20, MINIBATCHSIZE)

# Create trivial Flux NN and loss 
model = Chain(Dense(100, 50, relu), 
              Dense(50, 20))
myloss(m, x, y) = Flux.mse(m(x), y)

# Compare time to first gradient (restarting the session for each example):

using Zygote
@btime loss, grads = Flux.withgradient($myloss, $model, $X, $Y)
# 81.875 μs (87 allocations: 126.46 KiB)

using Enzyme
@btime loss, grads = Flux.withgradient($myloss, $Duplicated(model), $X, $Y)
# 82.875 μs (129 allocations: 84.27 KiB)

using Fluxperimental, Mooncake
@btime loss, grads = Flux.withgradient($myloss, $Moonduo(model), $Moonduo(X), $Moonduo(Y))
# 837.625 μs (16045 allocations: 1.89 MiB)

fclosure(m) = myloss(m, X, Y)
@btime loss, grads = Flux.withgradient($fclosure, $Moonduo(model))
# 919.000 μs (16048 allocations: 1.89 MiB)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Mooncake less efficient than Zygote or Enzyme on Flux layers #466

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Mooncake less efficient than Zygote or Enzyme on Flux layers #466

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions