stan-dev · charlesm93 · Apr 17, 2025 · Apr 18, 2025 · Apr 28, 2025 · May 28, 2025
diff --git a/src/functions-reference/embedded_laplace.qmd b/src/functions-reference/embedded_laplace.qmd
@@ -0,0 +1,380 @@
+---
+pagetitle: Embedded Laplace Approximation
+---
+
+# Embedded Laplace Approximation
+
+The embedded Laplace approximation can be used to approximate certain
+marginal and conditional distributions that arise in latent Gaussian models.
+A latent Gaussian model observes the following hierarchical structure:
+$$
+  \phi \sim p(\phi), \\
+  \theta \sim \text{MultiNormal}(0, K(\phi)),  \\
+  y \sim p(y \mid \theta, \phi),
+$$
+where $K(\phi)$ denotes the prior covariance matrix parameterized by $\phi$.
+To sample from the joint posterior $p(\phi, \theta \mid y)$, we can either
+use a standard method, such as Markov chain Monte Carlo, or we can follow
+a two-step procedure:
+
+1. sample from the *marginal posterior* $p(\phi \mid y)$,
+2. sample from the *conditional posterior* $p(\theta \mid y, \phi)$.
+
+In practice, neither the marginal posterior nor the conditional posterior
+are available in closed form and so they must be approximated.
+The marginal posterior can be written as  $p(\phi \mid y) \propto p(y \mid \phi) p(\phi)$,
+where  $p(y \mid \phi) = \int p(y \mid \phi, \theta) p(\theta) d\theta$ $ 
+is called marginal likelihood.  The Laplace method approximates 
+$p(y \mid \phi, \theta) p(\theta)$ with a normal distribution and the
+resulting Gaussian integral can be evaluated analytically to obtain an
+approximation to the log marginal likelihood 
+$\log \hat p(y \mid \phi) \approx \log p(y \mid \phi)$. 
+
+Combining this marginal likelihood with the prior in the `model`
+block, we can then sample from the marginal posterior $p(\phi \mid y)$
+using one of Stan's algorithms. The marginal posterior is lower
+dimensional and likely to have easier shape to sample leading more
+efficient inference. On the other hand each marginal likelihood
+computation is more costly, and the combined change in efficiency
+depends on the case.
+
+To obtain posterior draws for $\theta$, we sample from the normal
+approximation to $p(\theta \mid y, \phi)$ in `generated quantities`.
+The process of iteratively sampling from  $p(\phi \mid y)$ (say, with MCMC) and 
+then $p(\theta \mid y, \phi)$ produces samples from the joint posterior
+$p(\theta, \phi \mid y)$.
+
+The Laplace approximation is especially useful if $p(\theta)$ is
+multivariate normal and $p(y \mid \phi, \theta)$ is
+log-concave. Stan's embedded Laplace approximation is restricted to
+have multivariate normal prior $p(\theta)$ and ... likelihood 
+$p(y \mid \phi, \theta)$.
+
+
+## Specifying the likelihood function
+
+The first step to use the embedded Laplace approximation is to write down a 
+function in the `functions` block which returns the log joint likelihood 
+`\log p(y \mid \theta, \phi)`. There are a few constraints on this function:
+
+* The function return type must be `real`
+
+* The first argument must be the latent Gaussian variable $\theta$ and must
+have type `vector`.
+
+* The operations in the function must support higher-order automatic
+differentation (AD). Most functions in Stan support higher-order AD.
+The exceptions are functions with specialized calls for reverse-mode AD, and
+these are higher-order functions (algebraic solvers, differential equation
+solvers, and integrators) and the suite of hidden Markov model (HMM) functions.
+
+The signature of the function is
+```
+real ll_function(vector theta, ...)
+```
+There is no type restrictions for the variadic arguments `...` and each
+argument can be passed as data or parameter. As always, users should use
+parameter arguments only when nescessary in order to speed up differentiation.
+In general, we recommend marking data only arguments with the keyword `data`,
+for example,
+```
+real ll_function(vector theta, data vector x, ...)
+```
+
+## Specifying the covariance function
+
+We next need to specify a function that returns the prior covariance matrix
+$K$ as a function of the hyperparameters $\phi$.
+The only restriction is that this function returns a matrix with size
+$n \times n$ where $n$ is the size of $\theta$. The signature is:
+```
+matrix K_function(...)
+```
+There is no type restrictions for the variadic arguments. The variables $\phi$
+is implicitly defined as the collection of all non-data arguments passed to
+`ll_function` (excluding $\theta$) and `K_function`.
+
+
+## Approximating the log marginal likelihood $\log p(y \mid \phi)$
+
+In the `model` block, we increment `target` with `laplace_marginal`, a function
+that approximates the log marginal likelihood $\log p(y \mid \phi)$. 
+This function takes in the
+user-specified likelihood and prior covariance functions, as well as their arguments.
+These arguments must be passed as tuples, which can be generated on the fly
+using parenthesis. 
+We also need to pass an argument $\theta_0$ which serves as an initial guess for
+the optimization problem that underlies the Laplace approximation,
+$$
+  \underset{\theta}{\text{argmax}} \ \log p(\theta \mid y, \phi).
+$$
+The size of $\theta_0$ must be consistent with the size of the $\theta$ argument
+passed to `ll_function`.
+
+The signature of the function is:
+```
+target += laplace_marginal(function ll_function, tupple (...), vector theta_0, 
+                           function K_function, tupple (...)); 
+```
+The tuple `(...)` after `ll_function` contains the arguments that get passed
+to `ll_function` *excluding $\theta$*. Likewise, the tuple `(...)` after
+`ll_function` contains the arguments that get passed to `K_function`.
+
+It also possible to specify control parameters, which can help improve the
+optimization that underlies the Laplace approximation. Specifically:
+
+* `tol`: the tolerance $\epsilon$ of the optimizer. Specifically, the optimizer
+stops when $||\nabla \log p(\theta \mid y, \phi)|| \le \epsilon$. By default,
+the value is $\epsilon = 10^{-6}$.
+
+* `max_num_steps`: the maximum number of steps taken by the optimizer before
+it gives up (in which case the Metropolis proposal gets rejected). The default
+is 100 steps.
+
+* `hessian_block_size`: the size of the blocks, assuming the Hessian
+$\partial \log p(y \mid \theta, phi) \ \partial \theta$ is block-diagonal.
+The structure of the Hessian is determined by the dependence structure of $y$
+on $\theta$. By default, the Hessian is treated as diagonal
+(`hessian_block_size=1`). If the Hessian is not block diagonal, then set
+`hessian_block_size=n`, where `n` is the size of $\theta$.
+
+* `solver`: choice of Newton solver. The optimizer used to compute the
+Laplace approximation does one of three matrix decompositions to compute a
+Newton step. The problem determines which decomposition is numerical stable.
+By default (`solver=1`), the solver makes a Cholesky decomposition of the
+negative Hessian, $- \partial \log p(y \mid \theta, \phi) / \partial \theta$.
+If `solver=2`, the solver makes a Cholesky decomposition of the covariance
+matrix $K(\phi)$.
+If the Cholesky decomposition cannot be computed for neither the negative
+Hessian nor the covariance matrix, use `solver=3` which uses a more expensive
+but less specialized approach.
+
+* `max_steps_linesearch`: maximum number of steps in linesearch. The linesearch
+method tries to insure that the Newton step leads to a decrease in the
+objective function. If the Newton step does not improve the objective function,
+the step is repeatedly halved until the objective function decreases or the
+maximum number of steps in the linesearch is reached. By default,
+`max_steps_linesearch=0`, meaning no linesearch is performed.
+
+With these arguments at hand, we can call `laplace_marginal_tol` with the
+following signature:
+```
+target += laplace_margina_tol(function ll_function, tupple (...), vector theta_0, 
+                              function K_function, tupple (...),
+                              real tol, int max_steps, int hessian_block_size,
+                              int solver, int max_steps_linesearch);
+```
+
+## Sample from the approximate conditional $\hat{p}(\theta \mid y, \phi)$
+
+In `generated quantities`, it is possible to sample from the Laplace
+approximation of $p(\theta \mid \phi, y)$ using `laplace_latent_rng`.
+The signature for `laplace_latent_rng` follows closely
+the signature for `laplace_marginal`:
+```
+vector theta = 
+  laplace_latent_rng(function ll_function, tupple (...), vector theta_0,
+                     function K_function, tupple (...));
+```
+Once again, it is possible to specify control parameters:
+```
+vector theta = 
+  laplace_latent_tol_rng(function ll_function, tupple (...), vector theta_0,
+                         function K_function, tupple (...),
+                         real tol, int max_steps, int hessian_block_size,
+                         int solver, int max_steps_linesearch);
+```
+
+## Built-in Laplace marginal likelihood functions
+
+Stan supports certain built-in Laplace marginal likelihood functions. 
+This selection is currently
+narrow and expected to grow. The built-in functions exist for the user's
+convenience but are not more computationally efficient than specifying log
+likelihoods in the `functions` block.
+
+### Poisson with log link
+
+Given count data, with each observed count $y_i$ associated with a group
+$g(i)$ and a corresponding latent variable $\theta_{g(i)}$, and Poisson model, 
+the likelihood is
+$$
+p(y \mid \theta, \phi) = \prod_i\text{Poisson} (y_i \mid \exp(\theta_{g(i)})).
+$$
+The arguments required to compute this likelihood are:
+
+* `y`: an array of counts.
+* `y_index`: an array whose $i^\text{th}$ element indicates to which
+group the $i^\text{th}$ observation belongs to.
+
+The signatures for the embedded Laplace approximation function with a Poisson
+likelihood are
+```
+real laplace_marginal_poisson_log_lpmf(array[] int y | array[] int y_index,
+                                    vector theta0, function K_function, (...));
+
+real laplace_marginal_tol_poisson_log_lpmf(array[] int y | array[] int y_index,
+                               vector theta0, function K_function, (...),
+                               real tol, int max_steps, int hessian_block_size,
+                               int solver, int max_steps_linesearch);
+
+vector laplace_latent_poisson_log_rng(array[] int y, array[] int y_index,
+                               vector theta0, function K_function, (...));
+
+vector laplace_latent_tol_poisson_log_rng(array[] int y, array[] int y_index,
+                               vector theta0, function K_function, (...),
+                               real tol, int max_steps, int hessian_block_size,
+                               int solver, int max_steps_linesearch);
+```
+
+
+A similar built-in likelihood lets users specify an offset $x_i \in \mathbb R^+$
+to the rate parameter of the Poisson. The likelihood is then,
+$$
+p(y \mid \theta, \phi) = \prod_i\text{Poisson} (y_i \mid \exp(\theta_{g(i)}) x_i).
+$$
+The signatures for this function are:
+```
+real laplace_marginal_poisson2_log_lpmf(array[] int y | array[] int y_index,
+                                    vector x, vector theta0,
+                                    function K_function, (...));
+
+real laplace_marginal_tol_poisson2_log_lpmf(array[] int y | array[] int y_index,
+                               vector x, vector theta0,
+                               function K_function, (...),
+                               real tol, int max_steps, int hessian_block_size,
+                               int solver, int max_steps_linesearch);
+
+vector laplace_latent_poisson2_log_rng(array[] int y, array[] int y_index,
+                               vector x, vector theta0, 
+                               function K_function, (...));
+
+vector laplace_latent_tol_poisson2_log_rng(array[] int y, array[] int y_index,
+                               vector x, vector theta0,
+                               function K_function, (...),
+                               real tol, int max_steps, int hessian_block_size,
+                               int solver, int max_steps_linesearch);
+```
+
+
+### Negative Binomial with log link
+
+The negative Bionomial distribution generalizes the Poisson distribution by
+introducing the dispersion parameter $\eta$. The corresponding likelihood is then
+$$
+p(y \mid \theta, \phi) = \prod_i\text{NegBinomial2} (y_i \mid \exp(\theta_{g(i)}), \eta).
+$$
+Here we use the alternative paramererization implemented in Stan, meaning that
+$$
+\mathbb E(y_i) = \exp (\theta_{g(i)}), \\ 
+\text{Var}(y_i) = \mathbb E(y_i) + \frac{(\mathbb E(y_i))^2}{\eta}. 
+$$
+The arguments for the likelihood function are:
+
+* `y`: the observed counts
+* `y_index`: an array whose $i^\text{th}$ element indicates to which
+group the $i^\text{th}$ observation belongs to.
+* `eta`: the overdispersion parameter.
+
+The function signatures for the embedded Laplace approximation with a negative
+Binomial likelihood are
+```
+real laplace_marginal_neg_binomial_2_log_lpmf(array[] int y | 
+                  array[] int y_index, real eta, vector theta0, 
+                  function K_function, (...));
+
+real laplace_marginal_tol_neg_binomial_2_log_lpmf(array[] int y | 
+                  array[] int y_index, real eta, vector theta0, 
+                  function K_function, (...),
+                  real tol, int max_steps, int hessian_block_size,
+                  int solver, int max_steps_linesearch);
+
+vector laplace_latent_neg_binomial_2_log_rng(array[] int y, 
+                  array[] int y_index, real eta, vector theta0, 
+                  function K_function, (...));
+
+vector laplace_latent_tol_neg_binomial_2_log_rng(array[] int y, 
+                  array[] int y_index, real eta, vector theta0, 
+                  function K_function, (...),
+                  real tol, int max_steps, int hessian_block_size,
+                  int solver, int max_steps_linesearch);
+```
+
+### Bernoulli with logit link
+
+Given binary outcome $y_i \in \{0, 1\}$ and Bernoulli model, the likelihood is
+$$
+p(y \mid \theta, \phi) = \prod_i\text{Bernoulli} (y_i \mid \text{logit}^{-1}(\theta_{g(i)})).
+$$
+The arguments of the likelihood function are:
+
+* `y`: the observed counts
+* `y_index`: an array whose $i^\text{th}$ element indicates to which
+group the $i^\text{th}$ observation belongs to.
+
+The function signatures for the embedded Laplace approximation with a Bernoulli likelihood are
+```
+real laplace_marginal_bernoulli_logit_lpmf(array[] int y | 
+                  array[] int y_index, real eta, vector theta0, 
+                  function K_function, (...));
+
+real laplace_marginal_tol_bernoulli_logit_lpmf(array[] int y | 
+                  array[] int y_index, real eta, vector theta0, 
+                  function K_function, (...),
+                  real tol, int max_steps, int hessian_block_size,
+                  int solver, int max_steps_linesearch);
+
+vector laplace_latent_bernoulli_logit_rng(array[] int y, 
+                  array[] int y_index, real eta, vector theta0, 
+                  function K_function, (...));
+
+vector laplace_latent_tol_bernoulli_logit_rng(array[] int y, 
+                  array[] int y_index, real eta, vector theta0, 
+                  function K_function, (...),
+                  real tol, int max_steps, int hessian_block_size,
+                  int solver, int max_steps_linesearch);
+```
+
+<!-- ## Draw approximate samples for out-of-sample latent variables. -->
+
+<!-- In many applications, it is of interest to draw latent variables for -->
+<!-- in-sample and out-of-sample predictions. We respectively denote these latent -->
+<!-- variables $\theta$ and $\theta^*$. In a latent Gaussian model,  -->
+<!-- $(\theta, \theta^*)$ jointly follow a prior multivariate normal distribution: -->
+<!-- $$ -->
+<!--   \theta, \theta^* \sim \text{MultiNormal}(0, {\bf K}(\phi)), -->
+<!-- $$ -->
+<!-- where $\bf K$ designates the joint covariance matrix over $\theta, \theta^*$. -->
+
+<!-- We can break $\bf K$ into three components, -->
+<!-- $$ -->
+<!-- {\bf K} = \begin{bmatrix} -->
+<!--   K & \\ -->
+<!--   K^* & K^{**} -->
+<!-- \end{bmatrix}, -->
+<!-- $$ -->
+<!-- where $K$ is the prior covariance matrix for $\theta$, $K^{**}$ the prior -->
+<!-- covariance matrix for $\theta^*$, and $K^*$ the covariance matrix between -->
+<!-- $\theta$ and $\theta^*$. -->
+
+<!-- Stan supports the case where $\theta$ is associated with an in-sample -->
+<!-- covariate $X$ and $\theta^*$ with an out-of-sample covariate $X^*$. -->
+<!-- Furthermore, the covariance function is written in such a way that -->
+<!-- $$ -->
+<!-- K = f(..., X, X), \ \ K^{**} = f(..., X^*, X^*), \ \ K^* = f(..., X, X^*), -->
+<!-- $$ -->
+<!-- as is typically the case in Gaussian process models. -->
+
+
+
+
+
+<!-- The -->
+<!-- function `laplace_latent_rng` produces samples from the Laplace approximation -->
+<!-- and admits nearly the same arguments as `laplace_marginal`. A key difference -->
+<!-- is that  -->
+<!-- ``` -->
+<!-- vector laplace_latent_rng(function ll_function, tupple (...), vector theta_0,  -->
+<!--                           function K_function, tupple (...)); -->
+<!-- ``` -->
+