Skip to content

Inconsistency in handling empty arguments #802

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
penelopeysm opened this issue May 22, 2025 · 2 comments
Open

Inconsistency in handling empty arguments #802

penelopeysm opened this issue May 22, 2025 · 2 comments
Labels
backend Related to one or more autodiff backends

Comments

@penelopeysm
Copy link
Contributor

penelopeysm commented May 22, 2025

When differentiating with respect to an empty array, the results tend to vary:

using DifferentiationInterface, ForwardDiff, ReverseDiff, Mooncake, Enzyme

ADTYPES = [
    AutoForwardDiff(),
    AutoReverseDiff(),
    AutoMooncake(; config=nothing),
    AutoEnzyme(; mode=Forward),
    AutoEnzyme(; mode=Reverse),
    # and more...
]

for adtype in ADTYPES
    DifferentiationInterface.value_and_gradient(sum, adtype, Float64[])
end

ReverseDiff, Mooncake, and reverse Enzyme all happily return (0.0, []) 😄

Forward Enzyme tries to use a batch size of 0 and errors:

function DI.pick_batchsize(::AutoEnzyme, N::Integer)
B = DI.reasonable_batchsize(N, 16)
return DI.BatchSizeSettings{B}(N)
end

And ForwardDiff tries to construct a GradientResult which errors:

fc = DI.fix_tail(f, map(DI.unwrap, contexts)...)
result = GradientResult(x)
result = gradient!(result, fc, x)
return DR.value(result), DR.gradient(result)

https://github.com/JuliaDiff/DiffResults.jl/blob/fcf7858d393f0597fc74e195ed46f7bcbe5ff66c/src/DiffResults.jl#L64-L65

Funnily enough gradient with ForwardDiff (rather than value_and_gradient) is fine because it doesn't try to construct the GradientResult. I imagine the other operators would also have varying behaviour.

I suppose it is a bit of a trivial edge case, but would it be possible to unify the behaviour of the AD backends?

@gdalle
Copy link
Member

gdalle commented May 22, 2025

Hi, thanks for raising this issue! Honestly at this point it doesn't rank very high on my list of priorities, so it may linger for a while.
Do you have any example of realistic scenario where the user would want to compute an empty gradient and rely on it not erroring?

@gdalle gdalle added the backend Related to one or more autodiff backends label May 22, 2025
@penelopeysm
Copy link
Contributor Author

scenario where the user

Mostly, dealing with the case where somebody tries to sample from a model that only contains likelihood terms:

using Turing

@model function empty(x)
    x ~ Normal()
end

sample(empty(1.0), NUTS(), 100)

This is a little bit silly, but right now Turing's behaviour isn't consistent: it used to be that Turing/DynamicPPL would error, now sometimes it errors with AD errors (this issue), and sometimes it will actually sample fine and generate an empty chain. My ideal situation would be that it would always return an empty chain. The alternative is that we can check for an empty model before it even gets sent to AD.

It's also hardly a high priority item on my end, but it's low-hanging fruit, so I thought I may as well try to fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend Related to one or more autodiff backends
Projects
None yet
Development

No branches or pull requests

2 participants