Skip to content

Commit ba3557f

Browse files
committed
initial draft in function references manual.
1 parent b79ba48 commit ba3557f

File tree

1 file changed

+235
-0
lines changed

1 file changed

+235
-0
lines changed
Lines changed: 235 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,235 @@
1+
---
2+
pagetitle: Embedded Laplace Approximation
3+
---
4+
5+
# Embedded Laplace Approximation
6+
7+
The embedded Laplace approximation can be used to approximate certain
8+
marginal and conditional distributions that arise in latent Gaussian models.
9+
A latent Gaussian model observes the following hierarchical structure:
10+
$$
11+
\phi \sim p(\phi), \ \ \theta \sim \text{MultiNormal}(0, K(\phi)), \ \
12+
y \sim p(y \mid \theta, \phi),
13+
$$
14+
where $K(\phi)$ denotes the prior covariance matrix parameterized by $\phi$.
15+
To draw samples from the posterior $p(\phi, \theta \mid y)$, we can either
16+
use a standard method, such as Markov chain Monte Carlo, or we can follow
17+
a two-step procedure:
18+
19+
1. draw samples from the *marginal likelihood* $p(\phi \mid y)$
20+
2. draw samples from the *conditional posterior* $p(\theta \mid y, \phi)$.
21+
22+
In practice, neither the marginal likelihood nor the conditional posterior
23+
are available in close form and so they must be approximated.
24+
It turns out that if we have an approximation of $p(\theta \mid y, \phi)$,
25+
we immediately obtain an approximation of $p(\phi \mid y)$.
26+
The embedded Laplace approximation returns
27+
$\log \hat p(y \mid \phi) \approx \log p(y \mid \phi)$.
28+
Evaluating this log density in the `model` block, we can then sample from
29+
$p(\phi \mid y)$ using one of Stan's algorithms.
30+
31+
To obtain posterior draws for $\theta$, we generate samples from the Laplace
32+
approximation to $p(\theta \mid y, \phi)$ in `generated quantities`.
33+
34+
## Specifying the likelihood function
35+
36+
The first step is to write down a function in the `functions` block which
37+
returns `\log p(y \mid \theta, \phi)`. There are a few constraints on this
38+
function:
39+
40+
* The function return type must be `real`
41+
42+
* The first argument must be the latent Gaussian variable $\theta$ and must
43+
have type `vector`.
44+
45+
* The operations in the function must support higher-order automatic
46+
differentation (AD). Most functions in Stan support higher-order AD.
47+
The exceptions are functions with specialized calls for reverse-mode AD, and
48+
these are higher-order functions (algebraic solvers, differential equation
49+
solvers, and integrators) and the suite of hidden Markov model (HMM) functions.
50+
51+
The signature of the function is
52+
```
53+
real ll_function(vector theta, ...)
54+
```
55+
There is no type restrictions for the variadic arguments `...` and each
56+
argument can be passed as data or parameter. As always, users should use
57+
parameter arguments only when nescessary in order to speed up differentiation.
58+
In general, we recommend marking data only arguments with the keyword `data`,
59+
for example,
60+
```
61+
real ll_function(vector theta, data vector x, ...)
62+
```
63+
64+
## Specifying the covariance function
65+
66+
We next need to specify a function that returns the prior covariance matrix
67+
$K$ as a function of the hyperparameters $\phi$.
68+
The only restriction is that this function returns a matrix with size
69+
$n \times n$ where $n$ is the size of $\theta$. The signature is:
70+
```
71+
matrix K_function(...)
72+
```
73+
There is no type restrictions for the variadic arguments. The variables $\phi$
74+
is implicitly defined as the collection of all non-data arguments passed to
75+
`ll_function` (excluding $\theta$) and `K_function`.
76+
77+
78+
## Approximating the log marginal likelihood $\log p(y \mid \phi)$
79+
80+
In the `model` block, we increment `target` with `laplace_marginal`, a function
81+
that approximates $\log p(y \mid \phi)$. This function takes in the
82+
user-specified likelihood and covariance functions, as well as their arguments.
83+
These arguments must be passed as tuples, which can be generated on the fly
84+
using parenthesis.
85+
We also need to pass an argument $\theta_0$ which serves as an initial guess for
86+
the optimization problem that underlies the Laplace approximation,
87+
$$
88+
\underset{\theta}{\text{argmax}} \ \log p(\theta \mid y, \phi).
89+
$$
90+
The size of $\theta_0$ must be consistent with the size of the $\theta$ argument
91+
passed to `ll_function`.
92+
93+
The signature of the function is:
94+
```
95+
target += laplace_marginal(function ll_function, tupple (...), vector theta_0,
96+
function K_function, tupple (...));
97+
```
98+
The tuple `(...)` after `ll_function` contains the arguments that get passed
99+
to `ll_function` *excluding $\theta$*. Likewise, the tuple `(...)` after
100+
`ll_function` contains the arguments that get passed to `K_function`.
101+
102+
It also possible to specify control parameters, which can help improve the
103+
optimization that underlies the Laplace approximation. Specifically:
104+
105+
* `tol`: the tolerance $\epsilon$ of the optimizer. Specifically, the optimizer
106+
stops when $||\nabla \log p(\theta \mid y, \phi)|| \le \epsilon$. By default,
107+
the value is $\epsilon = 10^{-6}$.
108+
109+
* `max_num_steps`: the maximum number of steps taken by the optimizer before
110+
it gives up (in which case the Metropolis proposal gets rejected). The default
111+
is 100 steps.
112+
113+
* `hessian_block_size`: the size of the blocks, assuming the Hessian
114+
$\partial \log p(y \mid \theta, phi) \ \partial \theta$ is block-diagonal.
115+
The structure of the Hessian is determined by the dependence structure of $y$
116+
on $\theta$. By default, the Hessian is treated as diagonal
117+
(`hessian_block_size=1`). If the Hessian is not block diagonal, then set
118+
`hessian_block_size=n`, where `n` is the size of $\theta$.
119+
120+
* `solver`: choice of Newton solver. The optimizer used to compute the
121+
Laplace approximation does one of three matrix decompositions to compute a
122+
Newton step. The problem determines which decomposition is numerical stable.
123+
By default (`solver=1`), the solver makes a Cholesky decomposition of the
124+
negative Hessian, $- \partial \log p(y \mid \theta, \phi) / \partial \theta$.
125+
If `solver=2`, the solver makes a Cholesky decomposition of the covariance
126+
matrix $K(\phi)$.
127+
If the Cholesky decomposition cannot be computed for neither the negative
128+
Hessian nor the covariance matrix, use `solver=3` which uses a more expensive
129+
but less specialized approach.
130+
131+
* `max_steps_linesearch`: maximum number of steps in linesearch. The linesearch
132+
method tries to insure that the Newton step leads to a decrease in the
133+
objective function. If the Newton step does not improve the objective function,
134+
the step is repeatedly halved until the objective function decreases or the
135+
maximum number of steps in the linesearch is reached. By default,
136+
`max_steps_linesearch=0`, meaning no linesearch is performed.
137+
138+
With these arguments at hand, we can call `laplace_marginal_tol` with the
139+
following signature:
140+
```
141+
target += laplace_margina_tol(function ll_function, tupple (...), vector theta_0,
142+
function K_function, tupple (...),
143+
real tol, int max_steps, int hessian_block_size,
144+
int solver, int max_steps_linesearch);
145+
```
146+
147+
## Draw approximate samples from the conditional $p(\theta \mid y, \phi)$
148+
149+
In `generated quantities`, it is possible to draw samples from the Laplace
150+
approximation of $p(\theta \mid \phi, y)$ using `laplace_latent_rng`.
151+
The process of iteratively drawing from $p(\phi \mid y)$ (say, with MCMC) and
152+
then $p(\theta \mid y, \phi)$ produces samples from the joint posterior
153+
$p(\theta, \phi \mid y)$. The signature for `laplace_latent_rng` follows closely
154+
the signature for `laplace_marginal`:
155+
```
156+
vector theta =
157+
laplace_latent_rng(function ll_function, tupple (...), vector theta_0,
158+
function K_function, tupple (...));
159+
```
160+
Once again, it is possible to specify control parameters:
161+
```
162+
vector theta =
163+
laplace_latent_tol_rng(function ll_function, tupple (...), vector theta_0,
164+
function K_function, tupple (...),
165+
real tol, int max_steps, int hessian_block_size,
166+
int solver, int max_steps_linesearch);
167+
```
168+
169+
## Built-in likelihood functions for the embedded Laplace
170+
171+
Stan supports a narrow menu of built-in likelihood functions. These wrappers
172+
exist for the user's convenience but are not more computationally efficient
173+
than specifying log likelihoods in the `functions` block.
174+
175+
[...]
176+
177+
178+
## Draw approximate samples for out-of-sample latent variables.
179+
180+
In many applications, it is of interest to draw latent variables for
181+
in-sample and out-of-sample predictions. We respectively denote these latent
182+
variables $\theta$ and $\theta^*$. In a latent Gaussian model,
183+
$(\theta, \theta^*)$ jointly follow a prior multivariate normal distribution:
184+
$$
185+
\theta, \theta^* \sim \text{MultiNormal}(0, {\bf K}(\phi)),
186+
$$
187+
where $\bf K$ designates the joint covariance matrix over $\theta, \theta^*$.
188+
189+
We can break $\bf K$ into three components,
190+
$$
191+
{\bf K} = \begin{bmatrix}
192+
K & \\
193+
K^* & K^{**}
194+
\end{bmatrix},
195+
$$
196+
where $K$ is the prior covariance matrix for $\theta$, $K^{**}$ the prior
197+
covariance matrix for $\theta^*$, and $K^*$ the covariance matrix between
198+
$\theta$ and $\theta^*$.
199+
200+
Stan supports the case where $\theta$ is associated with an in-sample
201+
covariate $X$ and $\theta^*$ with an out-of-sample covariate $X^*$.
202+
Furthermore, the covariance function is written in such a way that
203+
$$
204+
K = f(..., X, X), \ \ K^{**} = f(..., X^*, X^*), \ \ K^* = f(..., X, X^*),
205+
$$
206+
as is typically the case in Gaussian process models.
207+
208+
209+
210+
211+
212+
The
213+
function `laplace_latent_rng` produces samples from the Laplace approximation
214+
and admits nearly the same arguments as `laplace_marginal`. A key difference
215+
is that
216+
```
217+
vector laplace_latent_rng(function ll_function, tupple (...), vector theta_0,
218+
function K_function, tupple (...));
219+
```
220+
221+
222+
223+
224+
225+
226+
227+
228+
229+
230+
231+
232+
233+
234+
235+

0 commit comments

Comments
 (0)