Skip to content

Add troubleshooting page #603

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 2, 2025
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 13 additions & 10 deletions _quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@ website:
- usage/sampler-visualisation/index.qmd
- usage/dynamichmc/index.qmd
- usage/external-samplers/index.qmd
- usage/troubleshooting/index.qmd

- section: "Tutorials"
contents:
Expand Down Expand Up @@ -181,17 +182,19 @@ probabilistic-pca: tutorials/11-probabilistic-pca
gplvm: tutorials/12-gplvm
seasonal-time-series: tutorials/13-seasonal-time-series
using-turing-advanced: tutorials/docs-09-using-turing-advanced
using-turing-autodiff: tutorials/docs-10-using-turing-autodiff
using-turing-dynamichmc: tutorials/docs-11-using-turing-dynamichmc
using-turing: tutorials/docs-12-using-turing-guide
using-turing-performance-tips: tutorials/docs-13-using-turing-performance-tips
using-turing-sampler-viz: tutorials/docs-15-using-turing-sampler-viz
using-turing-external-samplers: tutorials/docs-16-using-turing-external-samplers
using-turing-mode-estimation: tutorials/docs-17-mode-estimation
usage-probability-interface: tutorials/usage-probability-interface
usage-custom-distribution: tutorials/usage-custom-distribution
usage-tracking-extra-quantities: tutorials/tracking-extra-quantities
usage-modifying-logprob: tutorials/usage-modifying-logprob

usage-automatic-differentiation: usage/automatic-differentiation
usage-custom-distribution: usage/custom-distribution
usage-dynamichmc: usage/dynamichmc
usage-external-samplers: usage/external-samplers
usage-mode-estimation: usage/mode-estimation
usage-modifying-logprob: usage/modifying-logprob
usage-performance-tips: usage/performance-tips
usage-probability-interface: usage/probability-interface
usage-sampler-visualisation: usage/sampler-visualisation
usage-tracking-extra-quantities: usage/tracking-extra-quantities
usage-troubleshooting: usage/troubleshooting

contributing-guide: developers/contributing
dev-model-manual: developers/compiler/model-manual
Expand Down
2 changes: 1 addition & 1 deletion usage/performance-tips/index.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ supports several AD backends, including [ForwardDiff](https://github.com/JuliaDi

For many common types of models, the default ForwardDiff backend performs great, and there is no need to worry about changing it. However, if you need more speed, you can try
different backends via the standard [ADTypes](https://github.com/SciML/ADTypes.jl) interface by passing an `AbstractADType` to the sampler with the optional `adtype` argument, e.g.
`NUTS(adtype = AutoZygote())`. See [Automatic Differentiation]({{<meta using-turing-autodiff>}}) for details. Generally, `adtype = AutoForwardDiff()` is likely to be the fastest and most reliable for models with
`NUTS(adtype = AutoZygote())`. See [Automatic Differentiation]({{<meta usage-automatic-differentiation>}}) for details. Generally, `adtype = AutoForwardDiff()` is likely to be the fastest and most reliable for models with
few parameters (say, less than 20 or so), while reverse-mode backends such as `AutoZygote()` or `AutoReverseDiff()` will perform better for models with many parameters or linear algebra
operations. If in doubt, it's easy to try a few different backends to see how they compare.

Expand Down
112 changes: 112 additions & 0 deletions usage/troubleshooting/index.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
---
title: Troubleshooting
engine: julia
---

```{julia}
#| echo: false
#| output: false
using Pkg;
Pkg.instantiate();
```

This page collects a number of common error messages observed when using Turing, along with suggestions on how to fix them.

If the suggestions here do not resolve your problem, please do feel free to [open an issue](https://github.com/TuringLang/Turing.jl/issues).

```{julia}
using Turing
Turing.setprogress!(false)
```

## Initial parameters

> failed to find valid initial parameters in {N} tries. This may indicate an error with the model or AD backend...

This error is seen when a Hamiltonian Monte Carlo sampler is unable to determine a valid set of initial parameters for the sampling.
Here, 'valid' means that the log probability density of the model, as well as its gradient with respect to each parameter, is finite and not `NaN`.

### `NaN` gradient

One of the most common causes of this error is having a `NaN` gradient.
To find out whether this is happening, you can evaluate the gradient manually.
Here is an example with a model that is known to be problematic:

```{julia}
using Turing
using DynamicPPL.TestUtils.AD: run_ad

@model function initial_bad()
a ~ Normal()
x ~ truncated(Normal(a), 0, Inf)
end

model = initial_bad()
adtype = AutoForwardDiff()
result = run_ad(model, adtype; test=false, benchmark=false)
result.grad_actual
```

(See [the DynamicPPL docs](https://turinglang.org/DynamicPPL.jl/stable/api/#AD-testing-and-benchmarking-utilities) for more details on the `run_ad` function and its return type.)

In this case, the `NaN` gradient is caused by the `Inf` argument to `truncated`.
(See, e.g., [this issue on Distributions.jl](https://github.com/JuliaStats/Distributions.jl/issues/1910).)
Here, the upper bound of `Inf` is not needed, so it can be removed:

```{julia}
@model function initial_good()
a ~ Normal()
x ~ truncated(Normal(a); lower=0)
end

model = initial_good()
adtype = AutoForwardDiff()
run_ad(model, adtype; test=false, benchmark=false).grad_actual
```

More generally, you could try using a different AD backend; if you don't know why a model is returning `NaN` gradients, feel free to open an issue.

### `-Inf` log density

Another cause of this error is having models with very extreme parameters.
This example is taken from [this Turing.jl issue](https://github.com/TuringLang/Turing.jl/issues/2476):

```{julia}
@model function initial_bad2()
x ~ Exponential(100)
y ~ Uniform(0, x)
end
model = initial_bad2() | (y = 50.0,)

@model function initial_bad3()
x_trf ~ Uniform(0, 1)
x := -log(x_trf) / 100
@show x
y ~ Uniform(0, x)
end
model3 = initial_bad3() | (y = 50.0,)
```

The problem here is that HMC attempts to find initial values for parameters inside the region of `[-2, 2]`, _after_ the parameters have been transformed to unconstrained space.
For a distribution of `Exponential(100)`, the appropriate transformation is `log(x)` (see the [variable transformation docs]({{< meta dev-transforms-distributions >}}) for more info).

Thus, HMC attempts to find initial values of `log(x)` in the region of `[-2, 2]`, which corresponds to `x` in the region of `[exp(-2), exp(2)]` = `[0.135, 7.39]`.
However, all of these values of `x` will give rise to a zero probability density for `y` because the value of `y = 50.0` is outside the support of `Uniform(0, x)`.
Thus, the log density of the model is `-Inf`, as can be seen with `logjoint`:

```{julia}
logjoint(model, (x = exp(-2),))
```

```{julia}
logjoint(model, (x = exp(2),))
```

The most direct way of fixing this is to manually provide a set of initial parameters that are valid.
For example, you can obtain a set of initial parameters with `rand(Vector, model)`, and then pass this as the `initial_params` keyword argument to `sample`:

```{julia}
sample(model, NUTS(), 1000; initial_params=rand(Vector, model))
```

More generally, you may also consider reparameterising the model to avoid such issues.
Loading