-
-
Notifications
You must be signed in to change notification settings - Fork 118
Issue/873 embedded laplace #874
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from 3 commits
ba3557f
e8d1813
ff4414a
1bd8d17
a6ee3a7
bffbede
53f47d2
6b5a35f
7a07abd
3996971
48ec4e9
1db0c1d
88fa61f
b86fb23
3841dbe
40e9752
5d07d01
585823e
3545625
99661ef
229e3e4
fa1735b
70d717f
5bfa0a3
8dc6514
67c4952
1768706
5cec36e
b6625e8
2280b62
c418d08
894f151
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,380 @@ | ||
--- | ||
pagetitle: Embedded Laplace Approximation | ||
--- | ||
|
||
# Embedded Laplace Approximation | ||
|
||
The embedded Laplace approximation can be used to approximate certain | ||
marginal and conditional distributions that arise in latent Gaussian models. | ||
A latent Gaussian model observes the following hierarchical structure: | ||
$$ | ||
\phi \sim p(\phi), \\ | ||
\theta \sim \text{MultiNormal}(0, K(\phi)), \\ | ||
y \sim p(y \mid \theta, \phi), | ||
$$ | ||
where $K(\phi)$ denotes the prior covariance matrix parameterized by $\phi$. | ||
To sample from the joint posterior $p(\phi, \theta \mid y)$, we can either | ||
use a standard method, such as Markov chain Monte Carlo, or we can follow | ||
a two-step procedure: | ||
|
||
1. sample from the *marginal posterior* $p(\phi \mid y)$, | ||
2. sample from the *conditional posterior* $p(\theta \mid y, \phi)$. | ||
|
||
In practice, neither the marginal posterior nor the conditional posterior | ||
are available in closed form and so they must be approximated. | ||
The marginal posterior can be written as $p(\phi \mid y) \propto p(y \mid \phi) p(\phi)$, | ||
where $p(y \mid \phi) = \int p(y \mid \phi, \theta) p(\theta) d\theta$ $ | ||
is called marginal likelihood. The Laplace method approximates | ||
$p(y \mid \phi, \theta) p(\theta)$ with a normal distribution and the | ||
resulting Gaussian integral can be evaluated analytically to obtain an | ||
approximation to the log marginal likelihood | ||
$\log \hat p(y \mid \phi) \approx \log p(y \mid \phi)$. | ||
|
||
Combining this marginal likelihood with the prior in the `model` | ||
block, we can then sample from the marginal posterior $p(\phi \mid y)$ | ||
using one of Stan's algorithms. The marginal posterior is lower | ||
dimensional and likely to have easier shape to sample leading more | ||
efficient inference. On the other hand each marginal likelihood | ||
computation is more costly, and the combined change in efficiency | ||
depends on the case. | ||
|
||
To obtain posterior draws for $\theta$, we sample from the normal | ||
approximation to $p(\theta \mid y, \phi)$ in `generated quantities`. | ||
The process of iteratively sampling from $p(\phi \mid y)$ (say, with MCMC) and | ||
then $p(\theta \mid y, \phi)$ produces samples from the joint posterior | ||
$p(\theta, \phi \mid y)$. | ||
|
||
The Laplace approximation is especially useful if $p(\theta)$ is | ||
multivariate normal and $p(y \mid \phi, \theta)$ is | ||
log-concave. Stan's embedded Laplace approximation is restricted to | ||
have multivariate normal prior $p(\theta)$ and ... likelihood | ||
$p(y \mid \phi, \theta)$. | ||
|
||
|
||
## Specifying the likelihood function | ||
|
||
The first step to use the embedded Laplace approximation is to write down a | ||
function in the `functions` block which returns the log joint likelihood | ||
`\log p(y \mid \theta, \phi)`. There are a few constraints on this function: | ||
WardBrian marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
* The function return type must be `real` | ||
|
||
* The first argument must be the latent Gaussian variable $\theta$ and must | ||
have type `vector`. | ||
|
||
* The operations in the function must support higher-order automatic | ||
differentation (AD). Most functions in Stan support higher-order AD. | ||
The exceptions are functions with specialized calls for reverse-mode AD, and | ||
these are higher-order functions (algebraic solvers, differential equation | ||
solvers, and integrators) and the suite of hidden Markov model (HMM) functions. | ||
|
||
The signature of the function is | ||
``` | ||
real ll_function(vector theta, ...) | ||
``` | ||
There is no type restrictions for the variadic arguments `...` and each | ||
argument can be passed as data or parameter. As always, users should use | ||
parameter arguments only when nescessary in order to speed up differentiation. | ||
In general, we recommend marking data only arguments with the keyword `data`, | ||
for example, | ||
``` | ||
real ll_function(vector theta, data vector x, ...) | ||
``` | ||
|
||
## Specifying the covariance function | ||
|
||
We next need to specify a function that returns the prior covariance matrix | ||
$K$ as a function of the hyperparameters $\phi$. | ||
The only restriction is that this function returns a matrix with size | ||
$n \times n$ where $n$ is the size of $\theta$. The signature is: | ||
``` | ||
matrix K_function(...) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we call this |
||
``` | ||
There is no type restrictions for the variadic arguments. The variables $\phi$ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is phi here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think moreso from the section it's not clear how this is related to the k function |
||
is implicitly defined as the collection of all non-data arguments passed to | ||
`ll_function` (excluding $\theta$) and `K_function`. | ||
|
||
|
||
## Approximating the log marginal likelihood $\log p(y \mid \phi)$ | ||
|
||
In the `model` block, we increment `target` with `laplace_marginal`, a function | ||
that approximates the log marginal likelihood $\log p(y \mid \phi)$. | ||
This function takes in the | ||
user-specified likelihood and prior covariance functions, as well as their arguments. | ||
These arguments must be passed as tuples, which can be generated on the fly | ||
using parenthesis. | ||
We also need to pass an argument $\theta_0$ which serves as an initial guess for | ||
the optimization problem that underlies the Laplace approximation, | ||
$$ | ||
\underset{\theta}{\text{argmax}} \ \log p(\theta \mid y, \phi). | ||
$$ | ||
The size of $\theta_0$ must be consistent with the size of the $\theta$ argument | ||
passed to `ll_function`. | ||
|
||
The signature of the function is: | ||
``` | ||
target += laplace_marginal(function ll_function, tupple (...), vector theta_0, | ||
function K_function, tupple (...)); | ||
``` | ||
The tuple `(...)` after `ll_function` contains the arguments that get passed | ||
to `ll_function` *excluding $\theta$*. Likewise, the tuple `(...)` after | ||
`ll_function` contains the arguments that get passed to `K_function`. | ||
|
||
It also possible to specify control parameters, which can help improve the | ||
optimization that underlies the Laplace approximation. Specifically: | ||
|
||
* `tol`: the tolerance $\epsilon$ of the optimizer. Specifically, the optimizer | ||
stops when $||\nabla \log p(\theta \mid y, \phi)|| \le \epsilon$. By default, | ||
the value is $\epsilon = 10^{-6}$. | ||
|
||
* `max_num_steps`: the maximum number of steps taken by the optimizer before | ||
it gives up (in which case the Metropolis proposal gets rejected). The default | ||
is 100 steps. | ||
|
||
* `hessian_block_size`: the size of the blocks, assuming the Hessian | ||
$\partial \log p(y \mid \theta, phi) \ \partial \theta$ is block-diagonal. | ||
The structure of the Hessian is determined by the dependence structure of $y$ | ||
on $\theta$. By default, the Hessian is treated as diagonal | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should note where that if Hessian block size is not 1 or N then theta needs to be divisible by the Hessian block size |
||
(`hessian_block_size=1`). If the Hessian is not block diagonal, then set | ||
`hessian_block_size=n`, where `n` is the size of $\theta$. | ||
|
||
* `solver`: choice of Newton solver. The optimizer used to compute the | ||
Laplace approximation does one of three matrix decompositions to compute a | ||
Newton step. The problem determines which decomposition is numerical stable. | ||
By default (`solver=1`), the solver makes a Cholesky decomposition of the | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Make this a list |
||
negative Hessian, $- \partial \log p(y \mid \theta, \phi) / \partial \theta$. | ||
If `solver=2`, the solver makes a Cholesky decomposition of the covariance | ||
matrix $K(\phi)$. | ||
If the Cholesky decomposition cannot be computed for neither the negative | ||
Hessian nor the covariance matrix, use `solver=3` which uses a more expensive | ||
but less specialized approach. | ||
|
||
* `max_steps_linesearch`: maximum number of steps in linesearch. The linesearch | ||
method tries to insure that the Newton step leads to a decrease in the | ||
objective function. If the Newton step does not improve the objective function, | ||
the step is repeatedly halved until the objective function decreases or the | ||
maximum number of steps in the linesearch is reached. By default, | ||
`max_steps_linesearch=0`, meaning no linesearch is performed. | ||
|
||
With these arguments at hand, we can call `laplace_marginal_tol` with the | ||
following signature: | ||
``` | ||
target += laplace_margina_tol(function ll_function, tupple (...), vector theta_0, | ||
function K_function, tupple (...), | ||
real tol, int max_steps, int hessian_block_size, | ||
int solver, int max_steps_linesearch); | ||
``` | ||
|
||
## Sample from the approximate conditional $\hat{p}(\theta \mid y, \phi)$ | ||
|
||
In `generated quantities`, it is possible to sample from the Laplace | ||
approximation of $p(\theta \mid \phi, y)$ using `laplace_latent_rng`. | ||
The signature for `laplace_latent_rng` follows closely | ||
the signature for `laplace_marginal`: | ||
``` | ||
vector theta = | ||
laplace_latent_rng(function ll_function, tupple (...), vector theta_0, | ||
function K_function, tupple (...)); | ||
``` | ||
Once again, it is possible to specify control parameters: | ||
``` | ||
vector theta = | ||
laplace_latent_tol_rng(function ll_function, tupple (...), vector theta_0, | ||
function K_function, tupple (...), | ||
real tol, int max_steps, int hessian_block_size, | ||
int solver, int max_steps_linesearch); | ||
``` | ||
|
||
## Built-in Laplace marginal likelihood functions | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Any chance that all these wrappers could have argument for the mean? Now it's possible to define only the covariance, but there are many cases where the latent Gaussian model is used as a component, and the mean is non-zero and then these functions can't be used. Sorry for not noticing this before trying to use the release candidate There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree this makes sense and I should've pursued that in the first place. This would be straightforward to implement but then there is all the work that goes into testing and documenting this properly. So the question is whether we want to open an issue and add the custom functions in a future release or push for this to appear in the current release. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I also like the idea of getting rid of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It depends on if these would be new signatures or require breaking the existing signatures. I would prefer to do it in a future release, but that’s only possible if it would be backwards compatible There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would very much like to have mean as an argument in this release, and I can help with docs, but I'm not able to help with C++ and I understand if it's too much work to get it to this release. If it were left for the future release, it might be better to not make the current built-in Laplace marginal likelihood functions publi? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's worth starting a new thread for this @avehtari, probably in the math library issues. We can discuss options and their feasibility easier there |
||
|
||
Stan supports certain built-in Laplace marginal likelihood functions. | ||
This selection is currently | ||
narrow and expected to grow. The built-in functions exist for the user's | ||
convenience but are not more computationally efficient than specifying log | ||
likelihoods in the `functions` block. | ||
|
||
### Poisson with log link | ||
|
||
Given count data, with each observed count $y_i$ associated with a group | ||
$g(i)$ and a corresponding latent variable $\theta_{g(i)}$, and Poisson model, | ||
the likelihood is | ||
$$ | ||
p(y \mid \theta, \phi) = \prod_i\text{Poisson} (y_i \mid \exp(\theta_{g(i)})). | ||
$$ | ||
The arguments required to compute this likelihood are: | ||
|
||
* `y`: an array of counts. | ||
* `y_index`: an array whose $i^\text{th}$ element indicates to which | ||
group the $i^\text{th}$ observation belongs to. | ||
|
||
The signatures for the embedded Laplace approximation function with a Poisson | ||
likelihood are | ||
``` | ||
real laplace_marginal_poisson_log_lpmf(array[] int y | array[] int y_index, | ||
vector theta0, function K_function, (...)); | ||
|
||
real laplace_marginal_tol_poisson_log_lpmf(array[] int y | array[] int y_index, | ||
vector theta0, function K_function, (...), | ||
real tol, int max_steps, int hessian_block_size, | ||
int solver, int max_steps_linesearch); | ||
|
||
vector laplace_latent_poisson_log_rng(array[] int y, array[] int y_index, | ||
vector theta0, function K_function, (...)); | ||
|
||
vector laplace_latent_tol_poisson_log_rng(array[] int y, array[] int y_index, | ||
vector theta0, function K_function, (...), | ||
real tol, int max_steps, int hessian_block_size, | ||
int solver, int max_steps_linesearch); | ||
``` | ||
|
||
|
||
A similar built-in likelihood lets users specify an offset $x_i \in \mathbb R^+$ | ||
to the rate parameter of the Poisson. The likelihood is then, | ||
$$ | ||
p(y \mid \theta, \phi) = \prod_i\text{Poisson} (y_i \mid \exp(\theta_{g(i)}) x_i). | ||
$$ | ||
The signatures for this function are: | ||
``` | ||
real laplace_marginal_poisson2_log_lpmf(array[] int y | array[] int y_index, | ||
vector x, vector theta0, | ||
function K_function, (...)); | ||
|
||
real laplace_marginal_tol_poisson2_log_lpmf(array[] int y | array[] int y_index, | ||
vector x, vector theta0, | ||
function K_function, (...), | ||
real tol, int max_steps, int hessian_block_size, | ||
int solver, int max_steps_linesearch); | ||
|
||
vector laplace_latent_poisson2_log_rng(array[] int y, array[] int y_index, | ||
vector x, vector theta0, | ||
function K_function, (...)); | ||
|
||
vector laplace_latent_tol_poisson2_log_rng(array[] int y, array[] int y_index, | ||
vector x, vector theta0, | ||
function K_function, (...), | ||
real tol, int max_steps, int hessian_block_size, | ||
int solver, int max_steps_linesearch); | ||
``` | ||
|
||
|
||
### Negative Binomial with log link | ||
|
||
The negative Bionomial distribution generalizes the Poisson distribution by | ||
introducing the dispersion parameter $\eta$. The corresponding likelihood is then | ||
$$ | ||
p(y \mid \theta, \phi) = \prod_i\text{NegBinomial2} (y_i \mid \exp(\theta_{g(i)}), \eta). | ||
$$ | ||
Here we use the alternative paramererization implemented in Stan, meaning that | ||
$$ | ||
\mathbb E(y_i) = \exp (\theta_{g(i)}), \\ | ||
\text{Var}(y_i) = \mathbb E(y_i) + \frac{(\mathbb E(y_i))^2}{\eta}. | ||
$$ | ||
The arguments for the likelihood function are: | ||
|
||
* `y`: the observed counts | ||
* `y_index`: an array whose $i^\text{th}$ element indicates to which | ||
group the $i^\text{th}$ observation belongs to. | ||
* `eta`: the overdispersion parameter. | ||
|
||
The function signatures for the embedded Laplace approximation with a negative | ||
Binomial likelihood are | ||
``` | ||
real laplace_marginal_neg_binomial_2_log_lpmf(array[] int y | | ||
array[] int y_index, real eta, vector theta0, | ||
function K_function, (...)); | ||
|
||
real laplace_marginal_tol_neg_binomial_2_log_lpmf(array[] int y | | ||
array[] int y_index, real eta, vector theta0, | ||
function K_function, (...), | ||
real tol, int max_steps, int hessian_block_size, | ||
int solver, int max_steps_linesearch); | ||
|
||
vector laplace_latent_neg_binomial_2_log_rng(array[] int y, | ||
array[] int y_index, real eta, vector theta0, | ||
function K_function, (...)); | ||
|
||
vector laplace_latent_tol_neg_binomial_2_log_rng(array[] int y, | ||
array[] int y_index, real eta, vector theta0, | ||
function K_function, (...), | ||
real tol, int max_steps, int hessian_block_size, | ||
int solver, int max_steps_linesearch); | ||
``` | ||
|
||
### Bernoulli with logit link | ||
|
||
Given binary outcome $y_i \in \{0, 1\}$ and Bernoulli model, the likelihood is | ||
$$ | ||
p(y \mid \theta, \phi) = \prod_i\text{Bernoulli} (y_i \mid \text{logit}^{-1}(\theta_{g(i)})). | ||
$$ | ||
The arguments of the likelihood function are: | ||
|
||
* `y`: the observed counts | ||
* `y_index`: an array whose $i^\text{th}$ element indicates to which | ||
group the $i^\text{th}$ observation belongs to. | ||
|
||
The function signatures for the embedded Laplace approximation with a Bernoulli likelihood are | ||
``` | ||
real laplace_marginal_bernoulli_logit_lpmf(array[] int y | | ||
array[] int y_index, real eta, vector theta0, | ||
function K_function, (...)); | ||
|
||
real laplace_marginal_tol_bernoulli_logit_lpmf(array[] int y | | ||
array[] int y_index, real eta, vector theta0, | ||
function K_function, (...), | ||
real tol, int max_steps, int hessian_block_size, | ||
int solver, int max_steps_linesearch); | ||
|
||
vector laplace_latent_bernoulli_logit_rng(array[] int y, | ||
array[] int y_index, real eta, vector theta0, | ||
function K_function, (...)); | ||
|
||
vector laplace_latent_tol_bernoulli_logit_rng(array[] int y, | ||
array[] int y_index, real eta, vector theta0, | ||
function K_function, (...), | ||
real tol, int max_steps, int hessian_block_size, | ||
int solver, int max_steps_linesearch); | ||
``` | ||
|
||
<!-- ## Draw approximate samples for out-of-sample latent variables. --> | ||
|
||
<!-- In many applications, it is of interest to draw latent variables for --> | ||
<!-- in-sample and out-of-sample predictions. We respectively denote these latent --> | ||
<!-- variables $\theta$ and $\theta^*$. In a latent Gaussian model, --> | ||
<!-- $(\theta, \theta^*)$ jointly follow a prior multivariate normal distribution: --> | ||
<!-- $$ --> | ||
<!-- \theta, \theta^* \sim \text{MultiNormal}(0, {\bf K}(\phi)), --> | ||
<!-- $$ --> | ||
<!-- where $\bf K$ designates the joint covariance matrix over $\theta, \theta^*$. --> | ||
|
||
<!-- We can break $\bf K$ into three components, --> | ||
<!-- $$ --> | ||
<!-- {\bf K} = \begin{bmatrix} --> | ||
<!-- K & \\ --> | ||
<!-- K^* & K^{**} --> | ||
<!-- \end{bmatrix}, --> | ||
<!-- $$ --> | ||
<!-- where $K$ is the prior covariance matrix for $\theta$, $K^{**}$ the prior --> | ||
<!-- covariance matrix for $\theta^*$, and $K^*$ the covariance matrix between --> | ||
<!-- $\theta$ and $\theta^*$. --> | ||
|
||
<!-- Stan supports the case where $\theta$ is associated with an in-sample --> | ||
<!-- covariate $X$ and $\theta^*$ with an out-of-sample covariate $X^*$. --> | ||
<!-- Furthermore, the covariance function is written in such a way that --> | ||
<!-- $$ --> | ||
<!-- K = f(..., X, X), \ \ K^{**} = f(..., X^*, X^*), \ \ K^* = f(..., X, X^*), --> | ||
<!-- $$ --> | ||
<!-- as is typically the case in Gaussian process models. --> | ||
|
||
|
||
|
||
|
||
|
||
<!-- The --> | ||
<!-- function `laplace_latent_rng` produces samples from the Laplace approximation --> | ||
<!-- and admits nearly the same arguments as `laplace_marginal`. A key difference --> | ||
<!-- is that --> | ||
<!-- ``` --> | ||
<!-- vector laplace_latent_rng(function ll_function, tupple (...), vector theta_0, --> | ||
<!-- function K_function, tupple (...)); --> | ||
<!-- ``` --> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add here the restrictions for the likelihood
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two kinds of restrictions:
I'll assume you have the first in mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I was thinking the first as a restriction. For the second one we can say which kind of likelihood are more likely to work, ie, log concave and maybe near log concave