-
Notifications
You must be signed in to change notification settings - Fork 19
Open
Labels
enhancement (performance)Would reduce the time it takes to run some bit of the codeWould reduce the time it takes to run some bit of the code
Description
(This is a follow-up to a slack thread, cc @willtebbutt & @gdalle )
I have been comparing performance of different autodiff backends for training Flux models. I get one order of magnitude worse performance from Mooncake compared to Enzyme and even Zygote in most cases. This may be due to the Fluxperimental Moonduo implementation (https://github.com/FluxML/Fluxperimental.jl/blob/master/ext/FluxMooncakeExt.jl) rather than something with Mooncake? here is a MWE if useful.
Simple case of two standard feed forward layers. Benchmarks below with Flux v0.15.2, Zygote v0.6.75 (constrained from updating further by Fluxperimental), Enzyme v0.13.30, Mooncake v0.4.83, on Julia 1.10.5.
using Flux
using BenchmarkTools
# Create random inputs and targets
const MINIBATCHSIZE = 64
X = rand(Float32, 100, MINIBATCHSIZE)
Y = rand(Float32, 20, MINIBATCHSIZE)
# Create trivial Flux NN and loss
model = Chain(Dense(100, 50, relu),
Dense(50, 20))
myloss(m, x, y) = Flux.mse(m(x), y)
# Compare time to first gradient (restarting the session for each example):
using Zygote
@btime loss, grads = Flux.withgradient($myloss, $model, $X, $Y)
# 81.875 μs (87 allocations: 126.46 KiB)
using Enzyme
@btime loss, grads = Flux.withgradient($myloss, $Duplicated(model), $X, $Y)
# 82.875 μs (129 allocations: 84.27 KiB)
using Fluxperimental, Mooncake
@btime loss, grads = Flux.withgradient($myloss, $Moonduo(model), $Moonduo(X), $Moonduo(Y))
# 837.625 μs (16045 allocations: 1.89 MiB)
fclosure(m) = myloss(m, X, Y)
@btime loss, grads = Flux.withgradient($fclosure, $Moonduo(model))
# 919.000 μs (16048 allocations: 1.89 MiB)
willtebbutt and gdalle
Metadata
Metadata
Assignees
Labels
enhancement (performance)Would reduce the time it takes to run some bit of the codeWould reduce the time it takes to run some bit of the code