Skip to content

Improve API for AD testing #964

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: breaking
Choose a base branch
from
Open

Improve API for AD testing #964

wants to merge 7 commits into from

Conversation

penelopeysm
Copy link
Member

@penelopeysm penelopeysm commented Jun 30, 2025

This PR makes some fairly overdue improvements to the API of DynamicPPL.TestUtils.AD.run_ad.

rng argument

A rng keyword argument is provided to make it easier to seed the parameters used for running AD. Previously, if you wanted to make sure that two calls to run_ad used the same parameters (but you didn't care what parameters they were, just that they were the same!), you had to do:

rng = Xoshiro(468)
v = DynamicPPL.link(VarInfo(rng, model), model)
xs = v[:]

run_ad(model, adtype1; varinfo=v, params=xs)
run_ad(model, adtype2; varinfo=v, params=xs)

Now you can do:

run_ad(model, adtype1; rng=Xoshiro(468))
run_ad(model, adtype2; rng=Xoshiro(468))

I made rng a keyword argument rather than a positional argument because I consider rng-as-first-argument to be a multiple dispatch abuse anti-pattern, i.e., it serves no purpose except to force someone to declare a new method.

Closes #962

Correctness testing

Previously the test, reference_backend, and expected_value_and_grad keyword arguments all served the same purpose and it was not clear when one would supersede the other (e.g. if you put test=false, reference_backend=AutoForwardDiff(), and expected_value_and_grad=(value, grad) it was unclear whether it would skip testing, compare against ForwardDiff, or compare against the explicitly specified values).

I originally made this design choice to avoid having to make my own types (which would be some boilerplate and annoying for downstream users to import), but over the course of using this function (especially in ADTests) I have found the annoyance of not having a clear API to be bigger than the annoyance of adding some more imports.

This PR fixes it so that you can't specify multiple of these at the same time.

Tolerances

Previously, it was only possible to specify the atol used for testing; rtol could not be configured (and would default to zero, because that's what isapprox does when given a non-zero atol). This caused problems such as TuringLang/ADTests#33. This PR fixes it such that testing happens with nonzero atol and rtol (which can both be configured).

Closes #963

Copy link
Contributor

github-actions bot commented Jun 30, 2025

Benchmark Report for Commit be36626

Computer Information

Julia Version 1.11.5
Commit 760b2e5b739 (2025-04-14 06:53 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 4 × AMD EPYC 7763 64-Core Processor
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores)

Benchmark Results

|                 Model | Dimension |  AD Backend |      VarInfo Type | Linked | Eval Time / Ref Time | AD Time / Eval Time |
|-----------------------|-----------|-------------|-------------------|--------|----------------------|---------------------|
| Simple assume observe |         1 | forwarddiff |             typed |  false |                  8.3 |                 1.6 |
|           Smorgasbord |       201 | forwarddiff |             typed |  false |                660.9 |                40.7 |
|           Smorgasbord |       201 | forwarddiff | simple_namedtuple |   true |                305.5 |                68.1 |
|           Smorgasbord |       201 | forwarddiff |           untyped |   true |               1200.7 |                29.0 |
|           Smorgasbord |       201 | forwarddiff |       simple_dict |   true |               6893.0 |                25.2 |
|           Smorgasbord |       201 | reversediff |             typed |   true |               1467.3 |                26.8 |
|           Smorgasbord |       201 |    mooncake |             typed |   true |                998.8 |                 4.2 |
|    Loop univariate 1k |      1000 |    mooncake |             typed |   true |               5650.3 |                 3.9 |
|       Multivariate 1k |      1000 |    mooncake |             typed |   true |               1017.7 |                 8.4 |
|   Loop univariate 10k |     10000 |    mooncake |             typed |   true |              63684.5 |                 3.6 |
|      Multivariate 10k |     10000 |    mooncake |             typed |   true |               8210.7 |                10.0 |
|               Dynamic |        10 |    mooncake |             typed |   true |                128.5 |                11.7 |
|              Submodel |         1 |    mooncake |             typed |   true |                 12.6 |                 6.3 |
|                   LDA |        12 | reversediff |             typed |   true |               1187.7 |                 2.7 |

Copy link

codecov bot commented Jun 30, 2025

Codecov Report

Attention: Patch coverage is 64.70588% with 6 lines in your changes missing coverage. Please review.

Project coverage is 82.67%. Comparing base (7f20709) to head (be36626).
Report is 1 commits behind head on breaking.

Files with missing lines Patch % Lines
src/test_utils/ad.jl 64.70% 6 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           breaking     #964      +/-   ##
============================================
- Coverage     82.85%   82.67%   -0.18%     
============================================
  Files            38       38              
  Lines          4031     4018      -13     
============================================
- Hits           3340     3322      -18     
- Misses          691      696       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

DynamicPPL.jl documentation for PR #964 is available at:
https://TuringLang.github.io/DynamicPPL.jl/previews/PR964/

@mhauru
Copy link
Member

mhauru commented Jul 3, 2025

Let me know if this is ready for review.

@penelopeysm
Copy link
Member Author

Oops, I forgot about this one. Yeah I think it should be

@penelopeysm penelopeysm requested a review from mhauru July 3, 2025 10:17
@@ -211,6 +211,21 @@ To test and/or benchmark the performance of an AD backend on a model, DynamicPPL

```@docs
DynamicPPL.TestUtils.AD.run_ad
```

THe default test setting is to compare against ForwardDiff.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
THe default test setting is to compare against ForwardDiff.
The default test setting is to compare against ForwardDiff.

Comment on lines +27 to +28
`adtype` defaults to ForwardDiff.jl, since it's the default AD backend used in
Turing.jl.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we considered using FiniteDifferences.jl instead?


The tolerances for the value and gradient can be set using `value_atol` and
`grad_atol`. These default to 1e-6.
Note that gradients are always compared elementwise.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, this should probably change.

Comment on lines +249 to +251
if test isa NoTest
value_true = nothing
grad_true = nothing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these values actually ever get used?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants