Merge branch 'master' of https://github.com/stan-dev/docs

mitzimorris · mitzimorris · commit 90255c665456 · 2024-05-02T17:21:11.000-04:00
diff --git a/src/functions-reference/functions_index.qmd b/src/functions-reference/functions_index.qmd
@@ -1038,6 +1038,11 @@ pagetitle: Alphabetical Index
  - [`(T1 x, T2 y) : R`](real-valued_basic_functions.qmd#index-entry-d534a0845142902e1168c6b016843f301f8826d6)
 
 
+**fatal_error**:
+
+ - [`(T1 x1,..., TN xN) : void`](void_functions.qmd#index-entry-36f600d7cd1daea1b94b57949bed5e7914fd9442)
+
+
 **fdim**:
 
  - [`(real x, real y) : real`](real-valued_basic_functions.qmd#index-entry-8e4c91cd9725a9e73f23c330bf345dbc813f4a44)
diff --git a/src/functions-reference/higher-order_functions.qmd b/src/functions-reference/higher-order_functions.qmd
@@ -299,7 +299,7 @@ ODE solve function call.
 
 The arguments to the ODE solvers in both the stiff and non-stiff solvers are the
 same. The arguments to the adjoint ODE solver are different; see
-[Arguments to the adjoint ODE solvers](#adjoint-sensitivity-solver).
+[Arguments to the adjoint ODE solver](#adjoint-sensitivity-solver).
 
 *   *`ode`*: ODE system function,
 
@@ -333,7 +333,7 @@ or functions of parameters or transformed parameters.
 
 The arguments to the adjoint ODE solver are different from those for
 the other functions (for those see
-[Arguments to the adjoint ODE solvers](#forward-sensitivity-solver)).
+[Arguments to the ODE solvers](#forward-sensitivity-solver)).
 
 *   *`ode`*: ODE system function,
 
diff --git a/src/stan-users-guide/efficiency-tuning.qmd b/src/stan-users-guide/efficiency-tuning.qmd
@@ -1199,8 +1199,14 @@ should be faster.
 
 In some cases, models can be recoded to exploit sufficient statistics
 in estimation.  This can lead to large efficiency gains compared to an
-expanded model.  For example, consider the following Bernoulli
-sampling model.
+expanded model.   This section provides examples for Bernoulli
+and normal distributions, but the same approach can be applied to
+other members of the exponential family.
+
+
+### Bernoulli sufficient statistics {-}
+
+Consider the following Bernoulli sampling model.
 
 ```stan
 data {
@@ -1257,6 +1263,94 @@ the PMF and simply amount to an alternative, more efficient coding of
 the same likelihood.  For efficiency, the frequencies `f[k]`
 should be counted once in the transformed data block and stored.
 
+The same trick works for combining multiple binomial observations.
+
+
+### Normal sufficient statistics {-}
+
+Consider the following Stan model for fitting a normal distribution to data.
+
+```stan
+data {
+  int N;
+  vector[N] y;
+}
+parameters {
+  real mu;
+  real<lower=0> sigma;
+}
+model {
+  y ~ normal(mu, sigma);
+}
+```
+
+With the vectorized form used for `y`, Stan is clever enough to only
+evaluate `log(sigma)` once, but it still has to evaluate the normal
+for all of `y[1]` to `y[N]`, which involves adding up all the squared
+differences from the mean and then dividing by `sigma` squared.
+
+An equivalent density to the one above (up to normalizing constants
+that do not depend on parameters), is given in the following Stan
+program.
+
+```stan
+data {
+  int N;
+  vector[N] y;
+}
+transformed data {
+  real mean_y = mean(y);
+  real<lower=0> var_y = variance(y);
+  real nm1_over2 = 0.5 * (N - 1);
+  real sqrt_N = sqrt(N);
+}
+parameters {
+  real mu;
+  real<lower=0> sigma;
+}
+model {
+  mean_y ~ normal(mu, sigma / sqrt_N);
+  var_y ~ gamma(nm1_over2, nm1_over2 / sigma^2);
+}
+```
+
+The data and parameters are the same in this program as in the first.
+The second version adds a transformed data block to compute the mean
+and variance of the data, which are the sufficient statistics here.
+These are stored along with two other useful constants.  Then the
+program can define distributions over the mean and variance, both of
+which are scalars here.
+
+The original Stan program and this one define the same model in the
+sense that they define the same log density up to a constant additive
+term that does not depend on the parameters.  The priors on `mu` and
+`sigma` are both improper, but proper priors could be added as
+additional statements in the model block without affecting the
+sufficiency.
+
+This transform explicitly relies on aggregating the data.  Using this
+trick on parameters leads to more computation than just computing the
+normal log density, even before accounting for the non-linear change
+of variables in the variance.
+
+### Poisson sufficient statistics {-}
+
+The Poisson distribution is the easiest case, because the sum of
+observations is sufficient.  Specifically, we can replace
+
+```stan
+y ~ poisson(lambda);
+```
+
+with
+
+```stan
+sum(y) ~ poisson(size(y) * lambda);
+```
+
+This will work even if `y` is a parameter vector because no Jacobian
+adjustment is required for summation.
+
 
 ## Aggregating common subexpressions
 
@@ -1306,7 +1400,7 @@ unit sample variance has the following potential benefits:
 * It aids in the interpretation and comparison of the importance of coefficients across different predictors.
 
 When there are large differences between the units and scales of the predictors,
-standardizating the predictors is especially useful.
+standardizing the predictors is especially useful.
 This section illustrates the principle for a simple linear regression.
 
 Suppose that $y = (y_1,\dotsc,y_N)$ is a vector of $N$ outcomes and