Merge pull request #795 from stan-dev/doc-fixes

bob-carpenter · web-flow · commit 17d409741aee · 2024-08-02T14:47:25.000-04:00
Doc fixes
diff --git a/src/bibtex/all.bib b/src/bibtex/all.bib
@@ -1825,3 +1825,23 @@ @article{Riutort-Mayol:2023:HSGP
   pages={17},
   year={2023}
 }
+
+@article{Vehtari+etal:2021:Rhat,
+  title={Rank-normalization, folding, and localization: An improved $\widehat{R}$ for assessing convergence of {MCMC}},
+  author={Vehtari, Aki and Gelman, Andrew and Simpson, Daniel and Carpenter, Bob and B{\"u}rkner, Paul-Christian},
+  journal={Bayesian Analysis},
+  year=2021,
+  volume=16,
+ pages={667--718}
+}
+
+@article{Timonen+etal:2023:ODE-PSIS,
+  title={An importance sampling approach for reliable and efficient inference in {Bayesian} ordinary differential equation models},
+  author={Timonen, Juho and Siccha, Nikolas and Bales, Ben and L{\"a}hdesm{\"a}ki, Harri and Vehtari, Aki},
+  journal={Stat},
+  year={2023},
+  volume = 12,
+  number = 1,
+  pages = {e614} 
+}
+
diff --git a/src/reference-manual/analysis.qmd b/src/reference-manual/analysis.qmd
@@ -298,7 +298,7 @@ and can apply the standard tests.
 
 The second technical difficulty posed by MCMC methods is that the
 samples will typically be autocorrelated (or anticorrelated) within a
-chain.  This increases the uncertainty of the estimation of posterior
+chain.  This increases (or reduces) the uncertainty of the estimation of posterior
 quantities of interest, such as means, variances, or quantiles; see
 @Geyer:2011.
 
@@ -309,19 +309,19 @@ central limit theorem (CLT).
 
 Unlike most packages, the particular calculations used by Stan follow
 those for split-$\hat{R}$, which involve both cross-chain (mean) and
-within-chain calculations (autocorrelation); see @GelmanEtAl:2013.
+within-chain calculations (autocorrelation); see @GelmanEtAl:2013 and
+@Vehtari+etal:2021:Rhat.
 
 
 ### Definition of effective sample size {-}
 
 The amount by which autocorrelation within the chains increases
 uncertainty in estimates can be measured by effective sample size (ESS).
-Given independent samples, the central limit theorem
-bounds uncertainty in estimates based on the number of samples $N$.
-Given dependent samples, the number of independent samples is replaced
-with the effective sample size $N_{\mathrm{eff}}$, which is
-the number of independent samples with the same estimation power as
-the $N$ autocorrelated samples.  For example, estimation error is
+Given independent sample (with finite variance), the central limit theorem
+bounds uncertainty in estimates based on the sample size $N$.
+Given dependent sample, the sample size is replaced
+with the effective sample size $N_{\mathrm{eff}}$.  
+For example, Monte Carlo standard error (MCSE) is
 proportional to $1 / \sqrt{N_{\mathrm{eff}}}$ rather than
 $1/\sqrt{N}$.
 
@@ -364,16 +364,15 @@ $$
 
 
 For independent draws, the effective sample size is just the number of
-iterations.  For correlated draws, the effective sample size will be
-lower than the number of iterations.  For anticorrelated draws, the
+iterations.  For correlated draws, the effective sample size is usually 
+lower than the number of iterations, but in case of anticorrelated draws, the
 effective sample size can be larger than the number of iterations.  In
 this latter case, MCMC can work better than independent sampling for
 some estimation problems.  Hamiltonian Monte Carlo, including the
 no-U-turn sampler used by default in Stan, can produce anticorrelated
 draws if the posterior is close to Gaussian with little posterior
 correlation.
 
-
 ### Estimation of effective sample size {-}
 
 In practice, the probability function in question cannot be tractably
@@ -493,8 +492,8 @@ second approach with thinning can produce a higher effective sample
 size when the draws are positively correlated.  That's because the
 autocorrelation $\rho_t$ for the thinned sequence is equivalent to
 $\rho_{10t}$ in the unthinned sequence, so the sum of the
-autocorrelations will be lower and thus the effective sample size
-higher.
+autocorrelations usually will be lower and thus the effective sample size
+higher. 
 
 Now contrast the second approach above with the unthinned alternative,
 
@@ -506,4 +505,4 @@ large.  To summarize, *the only reason to thin a sample is to reduce
 memory requirements*.
 
 If draws are anticorrelated, then thinning will increase correlation
-and reduce the overall effective sample size.
+and further reduce the overall effective sample size.
diff --git a/src/reference-manual/types.qmd b/src/reference-manual/types.qmd
@@ -837,10 +837,10 @@ definite.  Like correlation matrices, covariance matrices only need a
 single dimension in their declaration.  For instance,
 
 ```stan
-cov_matrix[K] Omega;
+cov_matrix[K] Sigma;
 ```
 
-declares `Omega` to be a $K \times K$ covariance matrix, where
+declares `Sigma` to be a $K \times K$ covariance matrix, where
 $K$ is the value of the data variable `K`.
 
 
@@ -853,10 +853,10 @@ Because correlation matrices are square, only one dimension needs
 to be declared.  For example,
 
 ```stan
-corr_matrix[3] Sigma;
+corr_matrix[3] Omega;
 ```
 
-declares `Sigma` to be a $3 \times 3$ correlation matrix.
+declares `Omega` to be a $3 \times 3$ correlation matrix.
 
 Correlation matrices may be assigned to other matrices, including
 unconstrained matrices, if their dimensions match, and vice-versa.
diff --git a/src/stan-users-guide/algebraic-equations.qmd b/src/stan-users-guide/algebraic-equations.qmd
@@ -84,7 +84,7 @@ vector for parameters if the system does not involve data or parameters.
 Let's suppose $\theta = (3, 6)$. To call the algebraic solver, we need to
 provide an initial guess. This varies on a case-by-case basis, but in general
 a good guess will speed up the solver and, in pathological cases, even determine
-whether the solver converges or not. If the solver does not converge, the metropolis
+whether the solver converges or not. If the solver does not converge, the Metropolis
 proposal gets rejected and a warning message, stating no acceptable solution was
 found, is issued.
 
@@ -107,7 +107,7 @@ transformed parameters {
   vector[2] theta = [3, 6]';
   vector[2] y;
 
-  y = algebra_solver_newton(system, y_guess, theta, x_r, x_i);
+  y = solve_newton(system, y_guess, theta, x_r, x_i);
 }
 ```
 
@@ -137,24 +137,26 @@ For instance, it might make "physical sense" for a solution to be positive or ne
 
 On the other hand, a system may not have a solution (for a given point in the parameter
 space). In that case, the solver will not converge to a solution. When the solver fails to
-do so, the current metropolis proposal gets rejected.
+do so, the current Metropolis proposal gets rejected.
 
 ## Control parameters for the algebraic solver {#algebra-control.section}
 
-The call to the algebraic solver shown previously uses the default control settings. The solver
-allows three additional parameters, all of which must be supplied if any of them is
-supplied.
+The call to the algebraic solver shown previously uses the default control settings. The `_tol` variant of the solver function
+allows three additional parameters, all of which must be supplied.
 
 ```stan
-y = algebra_solver_newton(system, y_guess, theta, x_r, x_i,
-                          rel_tol, f_tol, max_steps);
+y = solve_newton_tol(system, y_guess, theta, x_r, x_i,
+                     scaling_step, f_tol, max_steps);
 ```
 
-The three control arguments are relative tolerance, function tolerance, and maximum
-number of steps. Both tolerances need to be satisfied. If one of them is not met, the
-metropolis proposal gets rejected with a warning message explaining which criterion
-was not satisfied. The default values for the control arguments are respectively
-`rel_tol = 1e-10` ($10^{-10}$), `f_tol = 1e-6` ($10^{-6}$), and `max_steps = 1e3` ($10^3$).
+For the Newton solver the three control arguments are scaling step, function tolerance, and maximum number of steps. For the Powell's hybrid method the three control arguments are relative tolerance, function tolerance, and maximum number of steps. If a Newton step is smaller than the scaling step tolerance, the code breaks, assuming the solver is no longer making significant progress. If set to 0, this constraint is ignored. For Powell's hybrid method the relative tolerance is the estimated relative error of the solver and serves to test if a satisfactory solution has been found. After convergence of the either solver, the proposed solution
+is plugged into the algebraic system and its norm is compared to the function tolerance. If the norm is below the function tolerance, the solution is deemed acceptable.  If the solver solver reaches the maximum number of steps, it stops and returns an error message. If one of the criteria is not met, the
+Metropolis proposal gets rejected with a warning message explaining which criterion
+was not satisfied. 
+
+
+The default values for the control arguments are respectively
+`scaling_step = 1e-3` ($10^{-3}$), `rel_tol = 1e-10` ($10^{-10}$), `f_tol = 1e-6` ($10^{-6}$), and `max_steps = 200` ($200$).
 
 ### Tolerance {-}
 
@@ -172,12 +174,12 @@ Smaller relative tolerances produce more accurate solutions but require more com
 #### Sensitivity analysis {-}
 
 The tolerances should be set low enough that setting them lower does not change the
-statistical properties of posterior samples generated by the Stan program.
+statistical properties of posterior samples generated by the Stan program. The sensitivity can be analysed using importance sampling without need to re-run MCMC with different tolerances as shown by @Timonen+etal:2023:ODE-PSIS.
 
 ### Maximum number of steps {-}
 
 The maximum number of steps can be used to stop a runaway simulation. This can arise in
 MCMC when a bad jump is taken, particularly during warmup. If the limit is hit, the
-current metropolis proposal gets rejected. Users will see a  warning message stating the
+current Metropolis proposal gets rejected. Users will see a  warning message stating the
 maximum number of steps has been exceeded.
 
diff --git a/src/stan-users-guide/decision-analysis.qmd b/src/stan-users-guide/decision-analysis.qmd
@@ -186,8 +186,8 @@ model {
 generated quantities {
   array[4] real util;
   for (k in 1:4) {
-    util[k] = U(lognormal_rng(mu[k], sigma[k]),
-                lognormal_rng(nu[k], tau[k]));
+    util[k] = U(lognormal_rng(nu[k], tau[k]),
+                lognormal_rng(mu[k], sigma[k]));
   }
 }
 ```
diff --git a/src/stan-users-guide/gaussian-processes.qmd b/src/stan-users-guide/gaussian-processes.qmd
@@ -555,7 +555,7 @@ covariance matrix called `L_cov_exp_quad_ARD`.
 
 ```stan
 functions {
-  matrix L_cov_exp_quad_ARD(vector[] x,
+  matrix L_cov_exp_quad_ARD(array[] vector x,
                             real alpha,
                             vector rho,
                             real delta) {

Original file line number	Diff line number	Diff line change
`@@ -186,8 +186,8 @@ model {`
`186`	`186`	`generated quantities {`
`187`	`187`	`array[4] real util;`
`188`	`188`	`for (k in 1:4) {`
`189`		`- util[k] = U(lognormal_rng(mu[k], sigma[k]),`
`190`		`- lognormal_rng(nu[k], tau[k]));`
	`189`	`+ util[k] = U(lognormal_rng(nu[k], tau[k]),`
	`190`	`+ lognormal_rng(mu[k], sigma[k]));`
`191`	`191`	`}`
`192`	`192`	`}`
`193`	`193`	```