-
Notifications
You must be signed in to change notification settings - Fork 19
Add getadtype
function to AbstractSampler interface
#158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
It feels slightly unusual to attach ad type to samplers—we do this in Turing, but I'm not sure it's the best design. |
I somewhat agree - that's why I left it as optional (defaulting to nothing). I think the sampler is probably the 'least bad place' to put it - the other options are the model ( Edit: I no longer think it's weird to put it in the model, see comment below |
The combination of |
I support that:) One thing, we don't actually use getadtype(::AbstractModel, ::AbstractSampler) And I wonder if we could also maybe use the same strategy as getparams and have a default implementation getadtype(::AbstractModel, spl::AbstractSampler) = getadtype(spl) and let users overload the method they prefer to use? |
I'd be happy with the getadtype(::AbstractModel, spl::AbstractSampler) = getadtype(spl) approach. What's the convention for dependencies of interface packages? I feel a bit awkward about explicitly requiring that the return type is |
The more I think about it, the more I'm convinced that the adtype should be in the model, not the sampler. But at the same time I don't know if it would be worth it to break everything :/ (1) From a gradient-based sampler's point of view, it should be agnostic to how the gradient is calculated, it just cares that it can have a gradient. For example, if the model obeys the LogDensityProblems interface, the sampler only really cares that Indeed, one might find that for a particular model it is possible to implement (2) The best AD backend to use is almost entirely determined by the model (e.g. the number of parameters, or the functionality inside the model) and thus it makes sense that the choice of AD backend should be tied to the model. (3) In DynamicPPL we might often want to test AD correctness and/or performance with a model, in a way like this @model f() = ...
ref_logp, ref_grad = logdensity_and_gradient_with(f(), AutoForwardDiff())
logp, grad = logdensity_and_gradient_with(f(), AutoMooncake())\
@test isapprox(logp, ref_logp)
@test isapprox(grad, ref_grad) We definitely don't want to have to construct a dummy sampler just to pass the adtype to the model. (4) Currently the way that Turing uses AD is this:
Specifying the AD type as part of the model will let us simplify this loop and in particular it will enable things like (3) above. (5) If we say that it's part of the model, then actually this whole PR does not need to exist. It can just be part of the interface of ProblemsThe obvious, huge, problem with this is that we would need to rework the whole sampling interface. e.g. instead of
we would need to write something like
In an ideal world I would actually prefer this. Of course, this would also be horribly breaking. The other drawback is that we can't mix and match adtypes, so we can't do this:
I don't see a convincing reason why someone would want to do this, though, so I don't feel bad about removing this capability. The final implication is that the default of ForwardDiff would have to be pushed up to DynamicPPL rather than sitting in Turing. I'm also quite happy for this to be the case because that does reflect our current position (i.e. running AD on DynamicPPL.Model is 'supposed' to be correct with ForwardDiff, and it's the reference against which we compare other AD backends). |
I have something of a plan to gradually shift us from declaring adtype in samplers to declaring adtype in models, i.e. to go from
to
Let's discuss this at a Monday meeting, I think we should definitely try to get it right, because this is one of the areas where we have awkward entanglement between model and sampler. |
I agree that adtypes should be attached to models instead of inference algorithms. One idea that I had: sample(f(), Gibbs(x=NUTS(), y=NUTS()), Autograd(x=AutoForwardDiff(), y=AutoMooncake())) or, more concisely (also automatic consistency guarantees of parameters partitioning between sample(f(), Gibbs(x=(NUTS(), AutoMooncake()), y=(NUTS(), AutoMooncake()))) EDIT: we could lower
internally for backwards compatibility. |
I no longer think this PR is the right way to do it, so closing. But I will add this general discussion to the agenda for Monday's meeting. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #158 +/- ##
=============================
=============================
☔ View full report in Codecov by Sentry. |
Right now, Turing.jl contains some pretty ad hoc code to determine the underlying AD backend for a sampler:
https://github.com/TuringLang/Turing.jl/blob/ddd74b1cf4694b66698f225db549d2cfb5eb826c/src/mcmc/Inference.jl#L181-L189
In the process, it does a bunch of type piracy and this definition also makes it hard to move any of this behaviour to DynamicPPL.
In particular, it actually means that the AD backend (inside a
LogDensityProblemsAD.ADgradient
object) is only specified by code inside Turing.jl, even though DynamicPPL should have enough information to infer the AD type from the sampler it's given.This PR proposes to move the AD type specification to the lower layer of AbstractMCMC so that this information is accessible across more layers.
It's not 100% obvious to me that gradient-based samplers necessarily need to declare their adtype (maybe depending on the sampler implementation, it might be declared elsewhere, e.g. in the model), but having
getadtype
be a property of the sampler does at least match what we currently do.Followups if this is merged
getadtype(::DynamicPPL.Sampler)