Skip to content

feat: backend switching for Mooncake #768

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 31 commits into from
Jun 18, 2025
Merged
Show file tree
Hide file tree
Changes from 24 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
1a389a6
Handles backend switching for Mooncake using ChainRules
AstitvaAggarwal Apr 1, 2025
08b176a
Mooncake Wrapper for substitute backends
AstitvaAggarwal Apr 2, 2025
ba0c9e6
Merge branch 'JuliaDiff:main' into develop
AstitvaAggarwal Apr 10, 2025
1340d92
added rules
AstitvaAggarwal Apr 10, 2025
2ce1ee2
Merge branch 'develop' of https://github.com/AstitvaAggarwal/Differen…
AstitvaAggarwal Apr 10, 2025
08de6df
config
AstitvaAggarwal Apr 10, 2025
84f27c9
splatting for dy
AstitvaAggarwal Apr 10, 2025
2e95299
brackets
AstitvaAggarwal Apr 10, 2025
13233e5
too easy
AstitvaAggarwal Apr 11, 2025
1e8df98
changes from reviews, Docs
AstitvaAggarwal Apr 12, 2025
afdddd4
changes from reviews - 2
AstitvaAggarwal Apr 18, 2025
233c312
Merge branch 'JuliaDiff:main' into develop
AstitvaAggarwal Apr 18, 2025
7a07127
changes from reviews-1
AstitvaAggarwal May 16, 2025
f3e436d
conflicts
AstitvaAggarwal May 16, 2025
6a0d937
conflicts-2
AstitvaAggarwal May 16, 2025
e543958
Update differentiate_with.jl
AstitvaAggarwal May 16, 2025
2472ecc
Merge branch 'JuliaDiff:main' into develop
AstitvaAggarwal May 16, 2025
c63c956
typecheck for array rule.
AstitvaAggarwal May 18, 2025
36da036
assertion for array inputs
AstitvaAggarwal May 18, 2025
d2b5a8c
Merge branch 'JuliaDiff:main' into develop
AstitvaAggarwal May 29, 2025
c389a80
extensive tests, diffwith for tuples
AstitvaAggarwal May 29, 2025
b4fe0f8
tests.
AstitvaAggarwal May 29, 2025
ec4b75d
tests, inc primal handling
AstitvaAggarwal May 31, 2025
0f0b9fc
changes from reviews
AstitvaAggarwal Jun 6, 2025
3c5f99e
Merge branch 'main' into develop
yebai Jun 13, 2025
d94f146
Apply suggestions from code review
gdalle Jun 13, 2025
c982f46
Simplify Mooncake rule tests, add ChainRules rule tests
gdalle Jun 13, 2025
749fea5
Format
gdalle Jun 13, 2025
9e5ecfd
Update differentiate_with.jl
gdalle Jun 14, 2025
1e85f17
Restrict to array of numbers
gdalle Jun 14, 2025
ff5c4e2
Update DifferentiationInterface/ext/DifferentiationInterfaceMooncakeE…
gdalle Jun 18, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion DifferentiationInterface/Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ JET = "0.9"
JLArrays = "0.2.0"
JuliaFormatter = "1,2"
LinearAlgebra = "1"
Mooncake = "0.4.88"
Mooncake = "0.4.121"
Pkg = "1"
PolyesterForwardDiff = "0.1.2"
Random = "1"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ In general, using a forward outer backend over a reverse inner backend will yiel
The wrapper [`DifferentiateWith`](@ref) allows you to switch between backends.
It takes a function `f` and specifies that `f` should be differentiated with the substitute backend of your choice, instead of whatever true backend the surrounding code is trying to use.
In other words, when someone tries to differentiate `dw = DifferentiateWith(f, substitute_backend)` with `true_backend`, then `substitute_backend` steps in and `true_backend` does not dive into the function `f` itself.
At the moment, `DifferentiateWith` only works when `true_backend` is either [ForwardDiff.jl](https://github.com/JuliaDiff/ForwardDiff.jl) or a [ChainRules.jl](https://github.com/JuliaDiff/ChainRules.jl)-compatible backend.
At the moment, `DifferentiateWith` only works when `true_backend` is either [ForwardDiff.jl](https://github.com/JuliaDiff/ForwardDiff.jl), [Mooncake.jl](https://github.com/chalk-lab/Mooncake.jl), or a [ChainRules.jl](https://github.com/JuliaDiff/ChainRules.jl)-compatible backend (e.g., [Zygote.jl](https://github.com/FluxML/Zygote.jl)).

## Implementations

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -111,4 +111,5 @@ There are, however, translation utilities:
### Backend switch

Also note the existence of [`DifferentiationInterface.DifferentiateWith`](@ref), which allows the user to wrap a function that should be differentiated with a specific backend.
Right now it only targets ForwardDiff.jl and ChainRulesCore.jl, but PRs are welcome to define Enzyme.jl and Mooncake.jl rules for this object.

Right now, it only targets [ForwardDiff.jl](https://github.com/JuliaDiff/ForwardDiff.jl), [Mooncake.jl](), [ChainRules.jl](https://juliadiff.org/ChainRulesCore.jl/stable/)-compatible backends (e.g., [Zygote.jl](https://github.com/FluxML/Zygote.jl)), but PRs are welcome to define Enzyme.jl rules for this object.
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,25 @@ module DifferentiationInterfaceMooncakeExt
using ADTypes: ADTypes, AutoMooncake
import DifferentiationInterface as DI
using Mooncake:
Mooncake,
CoDual,
Config,
prepare_gradient_cache,
prepare_pullback_cache,
tangent_type,
value_and_gradient!!,
value_and_pullback!!,
zero_tangent
zero_tangent,
rdata_type,
fdata,
rdata,
tangent_type,
NoTangent,
@is_primitive,
zero_fcodual,
MinimalCtx,
NoRData,
primal

DI.check_available(::AutoMooncake) = true

Expand All @@ -26,5 +37,6 @@ mycopy(x) = deepcopy(x)

include("onearg.jl")
include("twoarg.jl")
include("differentiate_with.jl")

end
Original file line number Diff line number Diff line change
@@ -0,0 +1,268 @@
@is_primitive MinimalCtx Tuple{DI.DifferentiateWith,<:Union{Number,AbstractArray,Tuple}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it a bit weird to have this union of Number, AbstractArray (two types which are theoretically supported for x inputs in DI) and then just Tuple (which is not officially part of the supported inputs). Why not also NamedTuple for instance? Is it better if we just say Any? Or restrict to Number and AbstractArray for the time being?


# nested vectors (eg. [[1.0]]), Tuples (eg. ((1.0,),)) or similar (eg. [(1.0,)]) primal types are not supported by DI yet !
# This is because basis construction (DI.basis) does not have overloads for these types.
# For details, refer commented out test cases to see where the pullback creation fails.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that we're testing DifferentiateWith(f, substitute_backend) with substitute_backend = AutoFiniteDiff(), aka a forward-mode backend. I think it should work with DifferentiateWith(f, AutoEnzyme())?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved by removing tuples for the time being

function Mooncake.rrule!!(
dw::CoDual{<:DI.DifferentiateWith}, x::Union{CoDual{<:Number},CoDual{<:Tuple}}
)
primal_func = primal(dw)
primal_x = primal(x)
(; f, backend) = primal_func
y = zero_fcodual(f(primal_x))

# output is a vector, so we need to use the vector pullback
function pullback_array!!(dy::NoRData)
tx = DI.pullback(f, backend, primal_x, (y.dx,))
@assert rdata(only(tx)) isa rdata_type(tangent_type(typeof(primal_x)))
return NoRData(), rdata(only(tx))
end

# output is a scalar, so we can use the scalar pullback
function pullback_scalar!!(dy::Number)
tx = DI.pullback(f, backend, primal_x, (dy,))
@assert rdata(only(tx)) isa rdata_type(tangent_type(typeof(primal_x)))
return NoRData(), rdata(only(tx))
end

# output is a Tuple, NTuple
function pullback_tuple!!(dy::Tuple)
tx = DI.pullback(f, backend, primal_x, (dy,))
@assert rdata(only(tx)) isa rdata_type(tangent_type(typeof(primal_x)))
return NoRData(), rdata(only(tx))
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only works for tuples of numbers, right? With a tuple of arrays for instance, it would fail? Perhaps it would be best for us to just remove support for tuples completely at first.


# inputs are non Differentiable
function pullback_nodiff!!(dy::NoRData)
@assert tangent_type(typeof(primal(x))) <: NoTangent
return NoRData(), dy
end

pullback = if tangent_type(typeof(primal(x))) <: NoTangent
pullback_nodiff!!
elseif typeof(primal(y)) <: Number
pullback_scalar!!
elseif typeof(primal(y)) <: Array
pullback_array!!
elseif typeof(primal(y)) <: Tuple
pullback_tuple!!
else
error(

Check warning on line 50 in DifferentiationInterface/ext/DifferentiationInterfaceMooncakeExt/differentiate_with.jl

View check run for this annotation

Codecov / codecov/patch

DifferentiationInterface/ext/DifferentiationInterfaceMooncakeExt/differentiate_with.jl#L50

Added line #L50 was not covered by tests
"For the function type $(typeof(primal_func)) and input type $(typeof(primal_x)), the primal type $(typeof(primal(y))) is currently not supported.",
)
end

return y, pullback
end

function Mooncake.rrule!!(dw::CoDual{<:DI.DifferentiateWith}, x::CoDual{<:AbstractArray})
primal_func = primal(dw)
primal_x = primal(x)
fdata_arg = x.dx
(; f, backend) = primal_func
y = zero_fcodual(f(primal_x))

# output is a vector, so we need to use the vector pullback
function pullback_array!!(dy::NoRData)
tx = DI.pullback(f, backend, primal_x, (y.dx,))
@assert rdata(first(only(tx))) isa rdata_type(tangent_type(typeof(first(primal_x))))
fdata_arg .+= only(tx)
return NoRData(), dy
end

# output is a scalar, so we can use the scalar pullback
function pullback_scalar!!(dy::Number)
tx = DI.pullback(f, backend, primal_x, (dy,))
@assert rdata(first(only(tx))) isa rdata_type(tangent_type(typeof(first(primal_x))))
fdata_arg .+= only(tx)
return NoRData(), NoRData()
end

# output is a Tuple, NTuple
function pullback_tuple!!(dy::Tuple)
tx = DI.pullback(f, backend, primal_x, (dy,))
@assert rdata(first(only(tx))) isa rdata_type(tangent_type(typeof(first(primal_x))))
fdata_arg .+= only(tx)
return NoRData(), NoRData()
end

# inputs are non Differentiable
function pullback_nodiff!!(dy::NoRData)
@assert tangent_type(typeof(primal(x))) <: Vector{NoTangent}
return NoRData(), dy
end

pullback = if tangent_type(typeof(primal(x))) <: Vector{NoTangent}
pullback_nodiff!!
elseif typeof(primal(y)) <: Number
pullback_scalar!!
elseif typeof(primal(y)) <: AbstractArray
pullback_array!!
elseif typeof(primal(y)) <: Tuple
pullback_tuple!!
else
error(

Check warning on line 104 in DifferentiationInterface/ext/DifferentiationInterfaceMooncakeExt/differentiate_with.jl

View check run for this annotation

Codecov / codecov/patch

DifferentiationInterface/ext/DifferentiationInterfaceMooncakeExt/differentiate_with.jl#L104

Added line #L104 was not covered by tests
"For the function type $(typeof(primal_func)) and input type $(typeof(primal_x)), the primal type $(typeof(primal(y))) is currently not supported.",
)
end

return y, pullback
end

function Mooncake.generate_derived_rrule!!_test_cases(rng_ctor, ::Val{:diffwith})
return Any[], Any[]
end

function Mooncake.generate_hand_written_rrule!!_test_cases(rng_ctor, ::Val{:diffwith})
test_cases = reduce(
vcat,
map([(x) -> DI.DifferentiateWith(x, DI.AutoFiniteDiff())]) do F
map([Float64, Float32]) do P
return Any[
# (false, :none, nothing, F(identity), ((1.0,),)), # (DI.basis fails for this, correct it!)
# (false, :none, nothing, F(identity), [[1.0]]), # (DI.basis fails for this, correct it!)
(false, :stability_and_allocs, nothing, F(cosh), P(0.3)),
(false, :stability_and_allocs, nothing, F(sinh), P(0.3)),
(
false,
:stability_and_allocs,
nothing,
F(Base.FastMath.exp10_fast),
P(0.5),
),
(
false,
:stability_and_allocs,
nothing,
F(Base.FastMath.exp2_fast),
P(0.5),
),
(
false,
:stability_and_allocs,
nothing,
F(Base.FastMath.exp_fast),
P(5.0),
),
(false, :stability, nothing, F(copy), rand(Int32, 5)),
]
end
end...,
)

map([(x) -> DI.DifferentiateWith(x, DI.AutoFiniteDiff())]) do F
push!(
test_cases,
Any[
(false, :stability, nothing, copy, randn(5, 4)),
(
# Check that Core._apply_iterate gets lifted to _apply_iterate_equivalent.
false,
:stability,
nothing,
F(x -> +(x...)),
randn(33),
),
(
false,
:stability,
nothing,
(F(
function (x)
rx = Ref(x)
return Base.pointerref(
Base.bitcast(Ptr{Float64}, pointer_from_objref(rx)), 1, 1
)
end,
)),
5.0,
),
# (false, :none, nothing, F(Mooncake.__vec_to_tuple), Any[(1.0,)]), # (DI.basis fails for this, correct it!)
(
false,
:stability_and_allocs,
nothing,
F(Mooncake.IntrinsicsWrappers.ctlz_int),
5,
),
(
false,
:stability_and_allocs,
nothing,
F(Mooncake.IntrinsicsWrappers.ctpop_int),
5,
),
(
false,
:stability_and_allocs,
nothing,
F(Mooncake.IntrinsicsWrappers.cttz_int),
5,
),
(
false,
:stability_and_allocs,
nothing,
F(Mooncake.IntrinsicsWrappers.abs_float),
5.0f0,
),
(false, :stability_and_allocs, nothing, F(deepcopy), 5.0),
(false, :stability, nothing, F(deepcopy), randn(5)),
(false, :stability_and_allocs, nothing, F(sin), 1.1),
(false, :stability_and_allocs, nothing, F(sin), 1.0f1),
(false, :stability_and_allocs, nothing, F(cos), 1.1),
(false, :stability_and_allocs, nothing, F(cos), 1.0f1),
(false, :stability_and_allocs, nothing, F(exp), 1.1),
(false, :stability_and_allocs, nothing, F(exp), 1.0f1),
]...,
)
end

map([(x) -> DI.DifferentiateWith(x, DI.AutoForwardDiff())]) do F
map([Float64, Float32]) do P
push!(
test_cases,
Any[
(
false,
:stability_and_allocs,
nothing,
F(Base.FastMath.sincos),
P(3.0),
),
(false, :none, nothing, F(Mooncake.__vec_to_tuple), [P(1.0)]),
]...,
)
end

push!(
test_cases,
Any[
(
false,
:stability_and_allocs,
nothing,
F(Mooncake.IntrinsicsWrappers.ctlz_int),
5,
),
(
false,
:stability_and_allocs,
nothing,
F(Mooncake.IntrinsicsWrappers.ctpop_int),
5,
),
(
false,
:stability_and_allocs,
nothing,
F(Mooncake.IntrinsicsWrappers.cttz_int),
5,
),
]...,
)
end

memory = Any[]
return test_cases, memory
end
6 changes: 5 additions & 1 deletion DifferentiationInterface/src/misc/differentiate_with.jl
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,13 @@ Moreover, any larger algorithm `alg` that calls `f2` instead of `f` will also be

!!! warning
`DifferentiateWith` only supports out-of-place functions `y = f(x)` without additional context arguments.
It only makes these functions differentiable if the true backend is either [ForwardDiff](https://github.com/JuliaDiff/ForwardDiff.jl) or automatically importing rules from [ChainRules](https://github.com/JuliaDiff/ChainRules.jl) (e.g. [Zygote](https://github.com/FluxML/Zygote.jl)). Some backends are also able to [manually import rules](https://juliadiff.org/ChainRulesCore.jl/stable/#Packages-supporting-importing-rules-from-ChainRules.) from ChainRules.
It only makes these functions differentiable if the true backend is either [ForwardDiff](https://github.com/JuliaDiff/ForwardDiff.jl), [Mooncake](https://github.com/chalk-lab/Mooncake.jl) or automatically importing rules from [ChainRules](https://github.com/JuliaDiff/ChainRules.jl) (e.g. [Zygote](https://github.com/FluxML/Zygote.jl)). Some backends are also able to [manually import rules](https://juliadiff.org/ChainRulesCore.jl/stable/#Packages-supporting-importing-rules-from-ChainRules.) from ChainRules.
For any other true backend, the differentiation behavior is not altered by `DifferentiateWith` (it becomes a transparent wrapper).

!!! warning
When using Mooncake as a substitute backend via `DifferentiateWith(f, AutoMooncake())`. The function `f` must not close over any active data.
As of now, we cannot differentiate with respect to parameters stored inside `f`.

# Fields

- `f`: the function in question, with signature `f(x)`
Expand Down
15 changes: 11 additions & 4 deletions DifferentiationInterface/test/Back/DifferentiateWith/test.jl
Original file line number Diff line number Diff line change
@@ -1,18 +1,21 @@
using Pkg
Pkg.add(["FiniteDiff", "ForwardDiff", "Zygote"])
Pkg.add(["FiniteDiff", "ForwardDiff", "Zygote", "Mooncake"])

using DifferentiationInterface, DifferentiationInterfaceTest
import DifferentiationInterfaceTest as DIT
using FiniteDiff: FiniteDiff
using ForwardDiff: ForwardDiff
using Zygote: Zygote
using Test
using Mooncake: Mooncake
using StableRNGs, Test

LOGGING = get(ENV, "CI", "false") == "false"

function differentiatewith_scenarios()
bad_scens = # these closurified scenarios have mutation and type constraints
filter(default_scenarios(; include_normal=false, include_closurified=true)) do scen
filter(
DIT.default_scenarios(; include_normal=false, include_closurified=true)
) do scen
DIT.function_place(scen) == :out
end
good_scens = map(bad_scens) do scen
Expand All @@ -22,8 +25,12 @@ function differentiatewith_scenarios()
end

test_differentiation(
[AutoForwardDiff(), AutoZygote()],
[AutoForwardDiff(), AutoZygote(), AutoMooncake(; config=nothing)],
differentiatewith_scenarios();
excluded=SECOND_ORDER,
logging=LOGGING,
)

@testset "Mooncake tests" begin
Mooncake.TestUtils.run_rrule!!_test_cases(StableRNG, Val(:diffwith))
end
Loading