Improve API for AD testing #964

penelopeysm · 2025-06-30T10:06:51Z

This PR makes some fairly overdue improvements to the API of DynamicPPL.TestUtils.AD.run_ad.

rng argument

A rng keyword argument is provided to make it easier to seed the parameters used for running AD. Previously, if you wanted to make sure that two calls to run_ad used the same parameters (but you didn't care what parameters they were, just that they were the same!), you had to do:

rng = Xoshiro(468)
v = DynamicPPL.link(VarInfo(rng, model), model)
xs = v[:]

run_ad(model, adtype1; varinfo=v, params=xs)
run_ad(model, adtype2; varinfo=v, params=xs)

Now you can do:

run_ad(model, adtype1; rng=Xoshiro(468))
run_ad(model, adtype2; rng=Xoshiro(468))

I made rng a keyword argument rather than a positional argument because I consider rng-as-first-argument to be a multiple dispatch abuse anti-pattern, i.e., it serves no purpose except to force someone to declare a new method.

Closes #962

Correctness testing

Previously the test, reference_backend, and expected_value_and_grad keyword arguments all served the same purpose and it was not clear when one would supersede the other (e.g. if you put test=false, reference_backend=AutoForwardDiff(), and expected_value_and_grad=(value, grad) it was unclear whether it would skip testing, compare against ForwardDiff, or compare against the explicitly specified values).

I originally made this design choice to avoid having to make my own types (which would be some boilerplate and annoying for downstream users to import), but over the course of using this function (especially in ADTests) I have found the annoyance of not having a clear API to be bigger than the annoyance of adding some more imports.

This PR fixes it so that you can't specify multiple of these at the same time.

Tolerances

Previously, it was only possible to specify the atol used for testing; rtol could not be configured (and would default to zero, because that's what isapprox does when given a non-zero atol). This caused problems such as TuringLang/ADTests#33. This PR fixes it such that testing happens with nonzero atol and rtol (which can both be configured).

Closes #963

github-actions · 2025-06-30T10:11:21Z

Benchmark Report for Commit `be36626`

Computer Information

Julia Version 1.11.5
Commit 760b2e5b739 (2025-04-14 06:53 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 4 × AMD EPYC 7763 64-Core Processor
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores)

Benchmark Results

|                 Model | Dimension |  AD Backend |      VarInfo Type | Linked | Eval Time / Ref Time | AD Time / Eval Time |
|-----------------------|-----------|-------------|-------------------|--------|----------------------|---------------------|
| Simple assume observe |         1 | forwarddiff |             typed |  false |                  8.3 |                 1.6 |
|           Smorgasbord |       201 | forwarddiff |             typed |  false |                660.9 |                40.7 |
|           Smorgasbord |       201 | forwarddiff | simple_namedtuple |   true |                305.5 |                68.1 |
|           Smorgasbord |       201 | forwarddiff |           untyped |   true |               1200.7 |                29.0 |
|           Smorgasbord |       201 | forwarddiff |       simple_dict |   true |               6893.0 |                25.2 |
|           Smorgasbord |       201 | reversediff |             typed |   true |               1467.3 |                26.8 |
|           Smorgasbord |       201 |    mooncake |             typed |   true |                998.8 |                 4.2 |
|    Loop univariate 1k |      1000 |    mooncake |             typed |   true |               5650.3 |                 3.9 |
|       Multivariate 1k |      1000 |    mooncake |             typed |   true |               1017.7 |                 8.4 |
|   Loop univariate 10k |     10000 |    mooncake |             typed |   true |              63684.5 |                 3.6 |
|      Multivariate 10k |     10000 |    mooncake |             typed |   true |               8210.7 |                10.0 |
|               Dynamic |        10 |    mooncake |             typed |   true |                128.5 |                11.7 |
|              Submodel |         1 |    mooncake |             typed |   true |                 12.6 |                 6.3 |
|                   LDA |        12 | reversediff |             typed |   true |               1187.7 |                 2.7 |

codecov · 2025-06-30T10:30:20Z

Codecov Report

Attention: Patch coverage is 64.70588% with 6 lines in your changes missing coverage. Please review.

Project coverage is 82.67%. Comparing base (7f20709) to head (be36626).
Report is 1 commits behind head on breaking.

Files with missing lines	Patch %	Lines
src/test_utils/ad.jl	64.70%	6 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##           breaking     #964      +/-   ##
============================================
- Coverage     82.85%   82.67%   -0.18%     
============================================
  Files            38       38              
  Lines          4031     4018      -13     
============================================
- Hits           3340     3322      -18     
- Misses          691      696       +5

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

github-actions · 2025-06-30T10:36:01Z

DynamicPPL.jl documentation for PR #964 is available at:
https://TuringLang.github.io/DynamicPPL.jl/previews/PR964/

mhauru · 2025-07-03T10:15:52Z

Let me know if this is ready for review.

penelopeysm · 2025-07-03T10:17:26Z

Oops, I forgot about this one. Yeah I think it should be

mhauru · 2025-07-03T16:14:01Z

docs/src/api.md

@@ -211,6 +211,21 @@ To test and/or benchmark the performance of an AD backend on a model, DynamicPPL

 ```@docs
 DynamicPPL.TestUtils.AD.run_ad
+```
+
+THe default test setting is to compare against ForwardDiff.


Suggested change

THe default test setting is to compare against ForwardDiff.

The default test setting is to compare against ForwardDiff.

mhauru · 2025-07-03T16:23:03Z

src/test_utils/ad.jl

+`adtype` defaults to ForwardDiff.jl, since it's the default AD backend used in
+Turing.jl.


Have we considered using FiniteDifferences.jl instead?

mhauru · 2025-07-03T16:30:07Z

src/test_utils/ad.jl


-   The tolerances for the value and gradient can be set using `value_atol` and
-   `grad_atol`. These default to 1e-6.
+   Note that gradients are always compared elementwise.


As discussed, this should probably change.

mhauru · 2025-07-03T16:33:32Z

src/test_utils/ad.jl

+    if test isa NoTest
+        value_true = nothing
+        grad_true = nothing


Do these values actually ever get used?

github-actions bot assigned penelopeysm Jun 30, 2025

penelopeysm force-pushed the py/improve-ad-api branch from bda0ea4 to 4ce84c2 Compare June 30, 2025 10:12

penelopeysm added 5 commits June 30, 2025 14:04

Rework API for AD testing

c9347b2

Fix test

2f8574e

Add rng keyword argument

48464f3

Use atol and rtol

6da8d57

remove unbound type parameter (?)

3587ce5

penelopeysm force-pushed the py/improve-ad-api branch from 20f4d56 to 3587ce5 Compare June 30, 2025 13:04

penelopeysm requested a review from mhauru July 3, 2025 10:17

penelopeysm added 2 commits July 3, 2025 17:29

Don't need to do elementwise check

e1043ae

Update changelog

be36626

mhauru requested changes Jul 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve API for AD testing #964

Improve API for AD testing #964

Uh oh!

penelopeysm commented Jun 30, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jun 30, 2025 •

edited

Loading

Uh oh!

codecov bot commented Jun 30, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jun 30, 2025

Uh oh!

mhauru commented Jul 3, 2025

Uh oh!

penelopeysm commented Jul 3, 2025

Uh oh!

mhauru Jul 3, 2025

Uh oh!

mhauru Jul 3, 2025

Uh oh!

mhauru Jul 3, 2025

Uh oh!

mhauru Jul 3, 2025

Uh oh!

Uh oh!

	THe default test setting is to compare against ForwardDiff.
	The default test setting is to compare against ForwardDiff.

		`adtype` defaults to ForwardDiff.jl, since it's the default AD backend used in
		Turing.jl.

Improve API for AD testing #964

Are you sure you want to change the base?

Improve API for AD testing #964

Uh oh!

Conversation

penelopeysm commented Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark Report for Commit be36626

Computer Information

Benchmark Results

Uh oh!

codecov bot commented Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented Jun 30, 2025

Uh oh!

mhauru commented Jul 3, 2025

Uh oh!

penelopeysm commented Jul 3, 2025

Uh oh!

mhauru Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

mhauru Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

mhauru Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

mhauru Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

penelopeysm commented Jun 30, 2025 •

edited

Loading

github-actions bot commented Jun 30, 2025 •

edited

Loading

Benchmark Report for Commit `be36626`

codecov bot commented Jun 30, 2025 •

edited

Loading