@@ -8,38 +8,54 @@ The embedded Laplace approximation can be used to approximate certain
8
8
marginal and conditional distributions that arise in latent Gaussian models.
9
9
A latent Gaussian model observes the following hierarchical structure:
10
10
$$
11
- \phi \sim p(\phi), \ \ \theta \sim \text{MultiNormal}(0, K(\phi)), \ \
11
+ \phi \sim p(\phi), \\
12
+ \theta \sim \text{MultiNormal}(0, K(\phi)), \\
12
13
y \sim p(y \mid \theta, \phi),
13
14
$$
14
15
where $K(\phi)$ denotes the prior covariance matrix parameterized by $\phi$.
15
- To draw samples from the posterior $p(\phi, \theta \mid y)$, we can either
16
+ To sample from the joint posterior $p(\phi, \theta \mid y)$, we can either
16
17
use a standard method, such as Markov chain Monte Carlo, or we can follow
17
18
a two-step procedure:
18
19
19
- 1 . draw samples from the * marginal likelihood* $p(\phi \mid y)$
20
- 2 . draw samples from the * conditional posterior* $p(\theta \mid y, \phi)$.
21
-
22
- In practice, neither the marginal likelihood nor the conditional posterior
23
- are available in close form and so they must be approximated.
24
- It turns out that if we have an approximation of $p(\theta \mid y, \phi)$,
25
- we immediately obtain an approximation of $p(\phi \mid y)$.
26
- The embedded Laplace approximation returns
27
- $\log \hat p(y \mid \phi) \approx \log p(y \mid \phi)$.
28
- Evaluating this log density in the ` model ` block, we can then sample from
29
- $p(\phi \mid y)$ using one of Stan's algorithms.
30
-
31
- To obtain posterior draws for $\theta$, we generate samples from the Laplace
20
+ 1 . sample from the * marginal posterior* $p(\phi \mid y)$,
21
+ 2 . sample from the * conditional posterior* $p(\theta \mid y, \phi)$.
22
+
23
+ In practice, neither the marginal posterior nor the conditional posterior
24
+ are available in closed form and so they must be approximated.
25
+ The marginal posterior can be written as $p(\phi \mid y) \propto p(y \mid \phi) p(\phi)$,
26
+ where $p(y \mid \phi) = \int p(y \mid \phi, \theta) p(\theta) d\theta$ $
27
+ is called marginal likelihood. The Laplace method approximates
28
+ $p(y \mid \phi, \theta) p(\theta)$ with a normal distribution and the
29
+ resulting Gaussian integral can be evaluated analytically to obtain an
30
+ approximation to the log marginal likelihood
31
+ $\log \hat p(y \mid \phi) \approx \log p(y \mid \phi)$.
32
+
33
+ Combining this marginal likelihood with the prior in the ` model `
34
+ block, we can then sample from the marginal posterior $p(\phi \mid y)$
35
+ using one of Stan's algorithms. The marginal posterior is lower
36
+ dimensional and likely to have easier shape to sample leading more
37
+ efficient inference. On the other hand each marginal likelihood
38
+ computation is more costly, and the combined change in efficiency
39
+ depends on the case.
40
+
41
+ To obtain posterior draws for $\theta$, we sample from the normal
32
42
approximation to $p(\theta \mid y, \phi)$ in ` generated quantities ` .
33
- The process of iteratively drawing from $p(\phi \mid y)$ (say, with MCMC) and
43
+ The process of iteratively sampling from $p(\phi \mid y)$ (say, with MCMC) and
34
44
then $p(\theta \mid y, \phi)$ produces samples from the joint posterior
35
45
$p(\theta, \phi \mid y)$.
36
46
47
+ The Laplace approximation is especially useful if $p(\theta)$ is
48
+ multivariate normal and $p(y \mid \phi, \theta)$ is
49
+ log-concave. Stan's embedded Laplace approximation is restricted to
50
+ have multivariate normal prior $p(\theta)$ and ... likelihood
51
+ $p(y \mid \phi, \theta)$.
52
+
37
53
38
54
## Specifying the likelihood function
39
55
40
56
The first step to use the embedded Laplace approximation is to write down a
41
- function in the ` functions ` block which returns ` \ log p(y \mid \theta, \phi) ` .
42
- There are a few constraints on this function:
57
+ function in the ` functions ` block which returns the log joint likelihood
58
+ ` \log p(y \mid \theta, \phi) ` . There are a few constraints on this function:
43
59
44
60
* The function return type must be ` real `
45
61
@@ -82,8 +98,9 @@ is implicitly defined as the collection of all non-data arguments passed to
82
98
## Approximating the log marginal likelihood $\log p(y \mid \phi)$
83
99
84
100
In the ` model ` block, we increment ` target ` with ` laplace_marginal ` , a function
85
- that approximates $\log p(y \mid \phi)$. This function takes in the
86
- user-specified likelihood and covariance functions, as well as their arguments.
101
+ that approximates the log marginal likelihood $\log p(y \mid \phi)$.
102
+ This function takes in the
103
+ user-specified likelihood and prior covariance functions, as well as their arguments.
87
104
These arguments must be passed as tuples, which can be generated on the fly
88
105
using parenthesis.
89
106
We also need to pass an argument $\theta_0$ which serves as an initial guess for
@@ -148,9 +165,9 @@ target += laplace_margina_tol(function ll_function, tupple (...), vector theta_0
148
165
int solver, int max_steps_linesearch);
149
166
```
150
167
151
- ## Draw approximate samples from the conditional $p (\theta \mid y, \phi)$
168
+ ## Sample from the approximate conditional $\hat{p} (\theta \mid y, \phi)$
152
169
153
- In ` generated quantities ` , it is possible to draw samples from the Laplace
170
+ In ` generated quantities ` , it is possible to sample from the Laplace
154
171
approximation of $p(\theta \mid \phi, y)$ using ` laplace_latent_rng ` .
155
172
The signature for ` laplace_latent_rng ` follows closely
156
173
the signature for ` laplace_marginal ` :
@@ -168,17 +185,19 @@ vector theta =
168
185
int solver, int max_steps_linesearch);
169
186
```
170
187
171
- ## Built-in likelihood functions
188
+ ## Built-in Laplace marginal likelihood functions
172
189
173
- Stan supports certain built-in likelihood functions. This selection is currently
190
+ Stan supports certain built-in Laplace marginal likelihood functions.
191
+ This selection is currently
174
192
narrow and expected to grow. The built-in functions exist for the user's
175
193
convenience but are not more computationally efficient than specifying log
176
194
likelihoods in the ` functions ` block.
177
195
178
- ### Poisson likelihood with log link
196
+ ### Poisson with log link
179
197
180
- Consider a count data, which each observed count $y_i$ associated with a group
181
- $g(i)$ and a corresponding latent variable $\theta_ {g(i)}$. The likelihood is
198
+ Given count data, with each observed count $y_i$ associated with a group
199
+ $g(i)$ and a corresponding latent variable $\theta_ {g(i)}$, and Poisson model,
200
+ the likelihood is
182
201
$$
183
202
p(y \mid \theta, \phi) = \prod_i\text{Poisson} (y_i \mid \exp(\theta_{g(i)})).
184
203
$$
@@ -238,16 +257,17 @@ vector laplace_latent_tol_poisson2_log_rng(array[] int y, array[] int y_index,
238
257
```
239
258
240
259
241
- ### Negative Binomial likelihood with log link
260
+ ### Negative Binomial with log link
242
261
243
- The negative Bionomial generalizes the Poisson likelihood function by
244
- introducing the dispersion parameter $\eta$. The likelihood is then
262
+ The negative Bionomial distribution generalizes the Poisson distribution by
263
+ introducing the dispersion parameter $\eta$. The corresponding likelihood is then
245
264
$$
246
265
p(y \mid \theta, \phi) = \prod_i\text{NegBinomial2} (y_i \mid \exp(\theta_{g(i)}), \eta).
247
266
$$
248
267
Here we use the alternative paramererization implemented in Stan, meaning that
249
268
$$
250
- \mathbb E(y_i) = \exp (\theta_{g(i)}), \ \ \text{Var}(y_i) = \mathbb E(y_i) + \frac{(\mathbb E(y_i))^2}{\eta}.
269
+ \mathbb E(y_i) = \exp (\theta_{g(i)}), \\
270
+ \text{Var}(y_i) = \mathbb E(y_i) + \frac{(\mathbb E(y_i))^2}{\eta}.
251
271
$$
252
272
The arguments for the likelihood function are:
253
273
@@ -280,9 +300,9 @@ vector laplace_latent_tol_neg_binomial_2_log_rng(array[] int y,
280
300
int solver, int max_steps_linesearch);
281
301
```
282
302
283
- ### Bernoulli likelihood with logit link
303
+ ### Bernoulli with logit link
284
304
285
- For a binary outcome $y_i \in \{ 0, 1\} $, the likelihood is
305
+ Given binary outcome $y_i \in \{ 0, 1\} $ and Bernoulli model , the likelihood is
286
306
$$
287
307
p(y \mid \theta, \phi) = \prod_i\text{Bernoulli} (y_i \mid \text{logit}^{-1}(\theta_{g(i)})).
288
308
$$
0 commit comments