Skip to content

Issue/873 embedded laplace #874

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 32 commits into
base: master
Choose a base branch
from
Open
Changes from 3 commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
ba3557f
initial draft in function references manual.
charlesm93 Apr 17, 2025
e8d1813
initial doc for embedded laplace.
charlesm93 Apr 18, 2025
ff4414a
edits
avehtari Apr 28, 2025
1bd8d17
Add to build
WardBrian May 28, 2025
a6ee3a7
Fix typos
WardBrian May 28, 2025
bffbede
Start signature formatting
WardBrian May 28, 2025
53f47d2
More signature work
WardBrian May 29, 2025
6b5a35f
Add distribution statements
WardBrian May 29, 2025
7a07abd
Merge branch 'master' into issue/873-embeddedLaplace
WardBrian May 29, 2025
3996971
Duplicate for lupmfs
WardBrian May 29, 2025
48ec4e9
specify restrictions on likelihood function.
charlesm93 Jun 3, 2025
1db0c1d
start addressing the decription todo
charlesm93 Jun 3, 2025
88fa61f
description of function signatures and proofread.
charlesm93 Jun 5, 2025
b86fb23
implement Steve's changes
charlesm93 Jun 5, 2025
3841dbe
Update metadata for index
WardBrian Jun 5, 2025
40e9752
propagate renames into latex index
WardBrian Jun 6, 2025
5d07d01
Merge branch 'issue/873-embeddedLaplace' of https://github.com/stan-d…
charlesm93 Jun 6, 2025
585823e
add details on how laplace approximation provides approximation for m…
charlesm93 Jun 6, 2025
3545625
fix typos spotted by Aki.
charlesm93 Jun 10, 2025
99661ef
correct signatures in function references and add section on embedded…
charlesm93 Jun 10, 2025
229e3e4
add examples with control parameters in GP sections.
charlesm93 Jun 11, 2025
fa1735b
draft section on embedded Laplace approximation in reference manual.
charlesm93 Jun 11, 2025
70d717f
add back html comments.
charlesm93 Jun 11, 2025
5bfa0a3
Fix metadata ordering, build
WardBrian Jun 11, 2025
8dc6514
Merge branch 'issue/873-embeddedLaplace' of https://github.com/stan-d…
charlesm93 Jun 11, 2025
67c4952
fix references on GP section.
charlesm93 Jun 11, 2025
1768706
Fix pdf build
WardBrian Jun 20, 2025
5cec36e
Web build
WardBrian Jun 20, 2025
b6625e8
Merge branch 'master' into issue/873-embeddedLaplace
WardBrian Jun 20, 2025
2280b62
Merge branch 'issue/873-embeddedLaplace' of https://github.com/stan-d…
charlesm93 Jun 24, 2025
c418d08
edit doc to reflect changes in embedded laplace: (i) theta_init is op…
charlesm93 Jun 24, 2025
894f151
update GP subsection on embedded Laplace.
charlesm93 Jun 24, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
380 changes: 380 additions & 0 deletions src/functions-reference/embedded_laplace.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,380 @@
---
pagetitle: Embedded Laplace Approximation
---

# Embedded Laplace Approximation

The embedded Laplace approximation can be used to approximate certain
marginal and conditional distributions that arise in latent Gaussian models.
A latent Gaussian model observes the following hierarchical structure:
$$
\phi \sim p(\phi), \\
\theta \sim \text{MultiNormal}(0, K(\phi)), \\
y \sim p(y \mid \theta, \phi),
$$
where $K(\phi)$ denotes the prior covariance matrix parameterized by $\phi$.
To sample from the joint posterior $p(\phi, \theta \mid y)$, we can either
use a standard method, such as Markov chain Monte Carlo, or we can follow
a two-step procedure:

1. sample from the *marginal posterior* $p(\phi \mid y)$,
2. sample from the *conditional posterior* $p(\theta \mid y, \phi)$.

In practice, neither the marginal posterior nor the conditional posterior
are available in closed form and so they must be approximated.
The marginal posterior can be written as $p(\phi \mid y) \propto p(y \mid \phi) p(\phi)$,
where $p(y \mid \phi) = \int p(y \mid \phi, \theta) p(\theta) d\theta$ $
is called marginal likelihood. The Laplace method approximates
$p(y \mid \phi, \theta) p(\theta)$ with a normal distribution and the
resulting Gaussian integral can be evaluated analytically to obtain an
approximation to the log marginal likelihood
$\log \hat p(y \mid \phi) \approx \log p(y \mid \phi)$.

Combining this marginal likelihood with the prior in the `model`
block, we can then sample from the marginal posterior $p(\phi \mid y)$
using one of Stan's algorithms. The marginal posterior is lower
dimensional and likely to have easier shape to sample leading more
efficient inference. On the other hand each marginal likelihood
computation is more costly, and the combined change in efficiency
depends on the case.

To obtain posterior draws for $\theta$, we sample from the normal
approximation to $p(\theta \mid y, \phi)$ in `generated quantities`.
The process of iteratively sampling from $p(\phi \mid y)$ (say, with MCMC) and
then $p(\theta \mid y, \phi)$ produces samples from the joint posterior
$p(\theta, \phi \mid y)$.

The Laplace approximation is especially useful if $p(\theta)$ is
multivariate normal and $p(y \mid \phi, \theta)$ is
log-concave. Stan's embedded Laplace approximation is restricted to
have multivariate normal prior $p(\theta)$ and ... likelihood
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add here the restrictions for the likelihood

Copy link
Contributor Author

@charlesm93 charlesm93 May 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two kinds of restrictions:

  • what the user can do without breaking Stan, i.e. the operations in the likelihood need to support higher-order autodiff.
  • what the user should do to insure the approximation is reliable.

I'll assume you have the first in mind.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was thinking the first as a restriction. For the second one we can say which kind of likelihood are more likely to work, ie, log concave and maybe near log concave

$p(y \mid \phi, \theta)$.


## Specifying the likelihood function

The first step to use the embedded Laplace approximation is to write down a
function in the `functions` block which returns the log joint likelihood
`\log p(y \mid \theta, \phi)`. There are a few constraints on this function:

* The function return type must be `real`

* The first argument must be the latent Gaussian variable $\theta$ and must
have type `vector`.

* The operations in the function must support higher-order automatic
differentation (AD). Most functions in Stan support higher-order AD.
The exceptions are functions with specialized calls for reverse-mode AD, and
these are higher-order functions (algebraic solvers, differential equation
solvers, and integrators) and the suite of hidden Markov model (HMM) functions.

The signature of the function is
```
real ll_function(vector theta, ...)
```
There is no type restrictions for the variadic arguments `...` and each
argument can be passed as data or parameter. As always, users should use
parameter arguments only when nescessary in order to speed up differentiation.
In general, we recommend marking data only arguments with the keyword `data`,
for example,
```
real ll_function(vector theta, data vector x, ...)
```

## Specifying the covariance function

We next need to specify a function that returns the prior covariance matrix
$K$ as a function of the hyperparameters $\phi$.
The only restriction is that this function returns a matrix with size
$n \times n$ where $n$ is the size of $\theta$. The signature is:
```
matrix K_function(...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we call this covariance_function?

```
There is no type restrictions for the variadic arguments. The variables $\phi$
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is phi here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think moreso from the section it's not clear how this is related to the k function

is implicitly defined as the collection of all non-data arguments passed to
`ll_function` (excluding $\theta$) and `K_function`.


## Approximating the log marginal likelihood $\log p(y \mid \phi)$

In the `model` block, we increment `target` with `laplace_marginal`, a function
that approximates the log marginal likelihood $\log p(y \mid \phi)$.
This function takes in the
user-specified likelihood and prior covariance functions, as well as their arguments.
These arguments must be passed as tuples, which can be generated on the fly
using parenthesis.
We also need to pass an argument $\theta_0$ which serves as an initial guess for
the optimization problem that underlies the Laplace approximation,
$$
\underset{\theta}{\text{argmax}} \ \log p(\theta \mid y, \phi).
$$
The size of $\theta_0$ must be consistent with the size of the $\theta$ argument
passed to `ll_function`.

The signature of the function is:
```
target += laplace_marginal(function ll_function, tupple (...), vector theta_0,
function K_function, tupple (...));
```
The tuple `(...)` after `ll_function` contains the arguments that get passed
to `ll_function` *excluding $\theta$*. Likewise, the tuple `(...)` after
`ll_function` contains the arguments that get passed to `K_function`.

It also possible to specify control parameters, which can help improve the
optimization that underlies the Laplace approximation. Specifically:

* `tol`: the tolerance $\epsilon$ of the optimizer. Specifically, the optimizer
stops when $||\nabla \log p(\theta \mid y, \phi)|| \le \epsilon$. By default,
the value is $\epsilon = 10^{-6}$.

* `max_num_steps`: the maximum number of steps taken by the optimizer before
it gives up (in which case the Metropolis proposal gets rejected). The default
is 100 steps.

* `hessian_block_size`: the size of the blocks, assuming the Hessian
$\partial \log p(y \mid \theta, phi) \ \partial \theta$ is block-diagonal.
The structure of the Hessian is determined by the dependence structure of $y$
on $\theta$. By default, the Hessian is treated as diagonal
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should note where that if Hessian block size is not 1 or N then theta needs to be divisible by the Hessian block size

(`hessian_block_size=1`). If the Hessian is not block diagonal, then set
`hessian_block_size=n`, where `n` is the size of $\theta$.

* `solver`: choice of Newton solver. The optimizer used to compute the
Laplace approximation does one of three matrix decompositions to compute a
Newton step. The problem determines which decomposition is numerical stable.
By default (`solver=1`), the solver makes a Cholesky decomposition of the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this a list

negative Hessian, $- \partial \log p(y \mid \theta, \phi) / \partial \theta$.
If `solver=2`, the solver makes a Cholesky decomposition of the covariance
matrix $K(\phi)$.
If the Cholesky decomposition cannot be computed for neither the negative
Hessian nor the covariance matrix, use `solver=3` which uses a more expensive
but less specialized approach.

* `max_steps_linesearch`: maximum number of steps in linesearch. The linesearch
method tries to insure that the Newton step leads to a decrease in the
objective function. If the Newton step does not improve the objective function,
the step is repeatedly halved until the objective function decreases or the
maximum number of steps in the linesearch is reached. By default,
`max_steps_linesearch=0`, meaning no linesearch is performed.

With these arguments at hand, we can call `laplace_marginal_tol` with the
following signature:
```
target += laplace_margina_tol(function ll_function, tupple (...), vector theta_0,
function K_function, tupple (...),
real tol, int max_steps, int hessian_block_size,
int solver, int max_steps_linesearch);
```

## Sample from the approximate conditional $\hat{p}(\theta \mid y, \phi)$

In `generated quantities`, it is possible to sample from the Laplace
approximation of $p(\theta \mid \phi, y)$ using `laplace_latent_rng`.
The signature for `laplace_latent_rng` follows closely
the signature for `laplace_marginal`:
```
vector theta =
laplace_latent_rng(function ll_function, tupple (...), vector theta_0,
function K_function, tupple (...));
```
Once again, it is possible to specify control parameters:
```
vector theta =
laplace_latent_tol_rng(function ll_function, tupple (...), vector theta_0,
function K_function, tupple (...),
real tol, int max_steps, int hessian_block_size,
int solver, int max_steps_linesearch);
```

## Built-in Laplace marginal likelihood functions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance that all these wrappers could have argument for the mean? Now it's possible to define only the covariance, but there are many cases where the latent Gaussian model is used as a component, and the mean is non-zero and then these functions can't be used. Sorry for not noticing this before trying to use the release candidate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this makes sense and I should've pursued that in the first place. This would be straightforward to implement but then there is all the work that goes into testing and documenting this properly.

So the question is whether we want to open an issue and add the custom functions in a future release or push for this to appear in the current release.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also like the idea of getting rid of poisson_log and poisson_log_2 and just having one (set of) function(s) for each likelihood and link, which requires user to specify the mean, potentially setting it a vector of 0s.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on if these would be new signatures or require breaking the existing signatures. I would prefer to do it in a future release, but that’s only possible if it would be backwards compatible

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would very much like to have mean as an argument in this release, and I can help with docs, but I'm not able to help with C++ and I understand if it's too much work to get it to this release. If it were left for the future release, it might be better to not make the current built-in Laplace marginal likelihood functions publi?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's worth starting a new thread for this @avehtari, probably in the math library issues. We can discuss options and their feasibility easier there


Stan supports certain built-in Laplace marginal likelihood functions.
This selection is currently
narrow and expected to grow. The built-in functions exist for the user's
convenience but are not more computationally efficient than specifying log
likelihoods in the `functions` block.

### Poisson with log link

Given count data, with each observed count $y_i$ associated with a group
$g(i)$ and a corresponding latent variable $\theta_{g(i)}$, and Poisson model,
the likelihood is
$$
p(y \mid \theta, \phi) = \prod_i\text{Poisson} (y_i \mid \exp(\theta_{g(i)})).
$$
The arguments required to compute this likelihood are:

* `y`: an array of counts.
* `y_index`: an array whose $i^\text{th}$ element indicates to which
group the $i^\text{th}$ observation belongs to.

The signatures for the embedded Laplace approximation function with a Poisson
likelihood are
```
real laplace_marginal_poisson_log_lpmf(array[] int y | array[] int y_index,
vector theta0, function K_function, (...));

real laplace_marginal_tol_poisson_log_lpmf(array[] int y | array[] int y_index,
vector theta0, function K_function, (...),
real tol, int max_steps, int hessian_block_size,
int solver, int max_steps_linesearch);

vector laplace_latent_poisson_log_rng(array[] int y, array[] int y_index,
vector theta0, function K_function, (...));

vector laplace_latent_tol_poisson_log_rng(array[] int y, array[] int y_index,
vector theta0, function K_function, (...),
real tol, int max_steps, int hessian_block_size,
int solver, int max_steps_linesearch);
```


A similar built-in likelihood lets users specify an offset $x_i \in \mathbb R^+$
to the rate parameter of the Poisson. The likelihood is then,
$$
p(y \mid \theta, \phi) = \prod_i\text{Poisson} (y_i \mid \exp(\theta_{g(i)}) x_i).
$$
The signatures for this function are:
```
real laplace_marginal_poisson2_log_lpmf(array[] int y | array[] int y_index,
vector x, vector theta0,
function K_function, (...));

real laplace_marginal_tol_poisson2_log_lpmf(array[] int y | array[] int y_index,
vector x, vector theta0,
function K_function, (...),
real tol, int max_steps, int hessian_block_size,
int solver, int max_steps_linesearch);

vector laplace_latent_poisson2_log_rng(array[] int y, array[] int y_index,
vector x, vector theta0,
function K_function, (...));

vector laplace_latent_tol_poisson2_log_rng(array[] int y, array[] int y_index,
vector x, vector theta0,
function K_function, (...),
real tol, int max_steps, int hessian_block_size,
int solver, int max_steps_linesearch);
```


### Negative Binomial with log link

The negative Bionomial distribution generalizes the Poisson distribution by
introducing the dispersion parameter $\eta$. The corresponding likelihood is then
$$
p(y \mid \theta, \phi) = \prod_i\text{NegBinomial2} (y_i \mid \exp(\theta_{g(i)}), \eta).
$$
Here we use the alternative paramererization implemented in Stan, meaning that
$$
\mathbb E(y_i) = \exp (\theta_{g(i)}), \\
\text{Var}(y_i) = \mathbb E(y_i) + \frac{(\mathbb E(y_i))^2}{\eta}.
$$
The arguments for the likelihood function are:

* `y`: the observed counts
* `y_index`: an array whose $i^\text{th}$ element indicates to which
group the $i^\text{th}$ observation belongs to.
* `eta`: the overdispersion parameter.

The function signatures for the embedded Laplace approximation with a negative
Binomial likelihood are
```
real laplace_marginal_neg_binomial_2_log_lpmf(array[] int y |
array[] int y_index, real eta, vector theta0,
function K_function, (...));

real laplace_marginal_tol_neg_binomial_2_log_lpmf(array[] int y |
array[] int y_index, real eta, vector theta0,
function K_function, (...),
real tol, int max_steps, int hessian_block_size,
int solver, int max_steps_linesearch);

vector laplace_latent_neg_binomial_2_log_rng(array[] int y,
array[] int y_index, real eta, vector theta0,
function K_function, (...));

vector laplace_latent_tol_neg_binomial_2_log_rng(array[] int y,
array[] int y_index, real eta, vector theta0,
function K_function, (...),
real tol, int max_steps, int hessian_block_size,
int solver, int max_steps_linesearch);
```

### Bernoulli with logit link

Given binary outcome $y_i \in \{0, 1\}$ and Bernoulli model, the likelihood is
$$
p(y \mid \theta, \phi) = \prod_i\text{Bernoulli} (y_i \mid \text{logit}^{-1}(\theta_{g(i)})).
$$
The arguments of the likelihood function are:

* `y`: the observed counts
* `y_index`: an array whose $i^\text{th}$ element indicates to which
group the $i^\text{th}$ observation belongs to.

The function signatures for the embedded Laplace approximation with a Bernoulli likelihood are
```
real laplace_marginal_bernoulli_logit_lpmf(array[] int y |
array[] int y_index, real eta, vector theta0,
function K_function, (...));

real laplace_marginal_tol_bernoulli_logit_lpmf(array[] int y |
array[] int y_index, real eta, vector theta0,
function K_function, (...),
real tol, int max_steps, int hessian_block_size,
int solver, int max_steps_linesearch);

vector laplace_latent_bernoulli_logit_rng(array[] int y,
array[] int y_index, real eta, vector theta0,
function K_function, (...));

vector laplace_latent_tol_bernoulli_logit_rng(array[] int y,
array[] int y_index, real eta, vector theta0,
function K_function, (...),
real tol, int max_steps, int hessian_block_size,
int solver, int max_steps_linesearch);
```

<!-- ## Draw approximate samples for out-of-sample latent variables. -->

<!-- In many applications, it is of interest to draw latent variables for -->
<!-- in-sample and out-of-sample predictions. We respectively denote these latent -->
<!-- variables $\theta$ and $\theta^*$. In a latent Gaussian model, -->
<!-- $(\theta, \theta^*)$ jointly follow a prior multivariate normal distribution: -->
<!-- $$ -->
<!-- \theta, \theta^* \sim \text{MultiNormal}(0, {\bf K}(\phi)), -->
<!-- $$ -->
<!-- where $\bf K$ designates the joint covariance matrix over $\theta, \theta^*$. -->

<!-- We can break $\bf K$ into three components, -->
<!-- $$ -->
<!-- {\bf K} = \begin{bmatrix} -->
<!-- K & \\ -->
<!-- K^* & K^{**} -->
<!-- \end{bmatrix}, -->
<!-- $$ -->
<!-- where $K$ is the prior covariance matrix for $\theta$, $K^{**}$ the prior -->
<!-- covariance matrix for $\theta^*$, and $K^*$ the covariance matrix between -->
<!-- $\theta$ and $\theta^*$. -->

<!-- Stan supports the case where $\theta$ is associated with an in-sample -->
<!-- covariate $X$ and $\theta^*$ with an out-of-sample covariate $X^*$. -->
<!-- Furthermore, the covariance function is written in such a way that -->
<!-- $$ -->
<!-- K = f(..., X, X), \ \ K^{**} = f(..., X^*, X^*), \ \ K^* = f(..., X, X^*), -->
<!-- $$ -->
<!-- as is typically the case in Gaussian process models. -->





<!-- The -->
<!-- function `laplace_latent_rng` produces samples from the Laplace approximation -->
<!-- and admits nearly the same arguments as `laplace_marginal`. A key difference -->
<!-- is that -->
<!-- ``` -->
<!-- vector laplace_latent_rng(function ll_function, tupple (...), vector theta_0, -->
<!-- function K_function, tupple (...)); -->
<!-- ``` -->