You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Funnily enough gradient with ForwardDiff (rather than value_and_gradient) is fine because it doesn't try to construct the GradientResult. I imagine the other operators would also have varying behaviour.
I suppose it is a bit of a trivial edge case, but would it be possible to unify the behaviour of the AD backends?
The text was updated successfully, but these errors were encountered:
Hi, thanks for raising this issue! Honestly at this point it doesn't rank very high on my list of priorities, so it may linger for a while.
Do you have any example of realistic scenario where the user would want to compute an empty gradient and rely on it not erroring?
Mostly, dealing with the case where somebody tries to sample from a model that only contains likelihood terms:
using Turing
@modelfunctionempty(x)
x ~Normal()
endsample(empty(1.0), NUTS(), 100)
This is a little bit silly, but right now Turing's behaviour isn't consistent: it used to be that Turing/DynamicPPL would error, now sometimes it errors with AD errors (this issue), and sometimes it will actually sample fine and generate an empty chain. My ideal situation would be that it would always return an empty chain. The alternative is that we can check for an empty model before it even gets sent to AD.
It's also hardly a high priority item on my end, but it's low-hanging fruit, so I thought I may as well try to fix it.
Uh oh!
There was an error while loading. Please reload this page.
When differentiating with respect to an empty array, the results tend to vary:
ReverseDiff, Mooncake, and reverse Enzyme all happily return
(0.0, [])
😄Forward Enzyme tries to use a batch size of 0 and errors:
DifferentiationInterface.jl/DifferentiationInterface/ext/DifferentiationInterfaceEnzymeExt/utils.jl
Lines 11 to 14 in 6a58124
And ForwardDiff tries to construct a
GradientResult
which errors:DifferentiationInterface.jl/DifferentiationInterface/ext/DifferentiationInterfaceForwardDiffExt/onearg.jl
Lines 315 to 318 in 6a58124
https://github.com/JuliaDiff/DiffResults.jl/blob/fcf7858d393f0597fc74e195ed46f7bcbe5ff66c/src/DiffResults.jl#L64-L65
Funnily enough
gradient
with ForwardDiff (rather thanvalue_and_gradient
) is fine because it doesn't try to construct theGradientResult
. I imagine the other operators would also have varying behaviour.I suppose it is a bit of a trivial edge case, but would it be possible to unify the behaviour of the AD backends?
The text was updated successfully, but these errors were encountered: