sampling statement -> distribution statement in stan-users-guide

avehtari · avehtari · commit a443f4f8a76d · 2024-05-10T10:45:03.000+03:00
diff --git a/src/stan-users-guide/custom-probability.qmd b/src/stan-users-guide/custom-probability.qmd
@@ -93,23 +93,23 @@ be parameters, data, or one of each, or even local variables.
 
 The assignment statement in the previous paragraph generates
 C++ code that is  similar to that generated by the following
-sampling statement.
+distribution statement.
 
 ```stan
 y ~ exponential(lambda);
 ```
 
-There are two notable differences.  First, the sampling statement will
+There are two notable differences.  First, the distribution statement will
 check the inputs to make sure both `lambda` is positive and
 `y` is non-negative (which includes checking that neither is the
 special not-a-number value).
 
 The second difference is that if `lambda` is not a parameter,
-transformed parameter, or local model variable, the sampling statement
+transformed parameter, or local model variable, the distribution statement
 is clever enough to drop the `log(lambda)` term.  This results in
 the same posterior because Stan only needs the log probability up to
 an additive constant.  If `lambda` and `y` are both
-constants, the sampling statement will drop both terms (but still
+constants, the distribution statement will drop both terms (but still
 check for out-of-domain errors on the inputs).
 
 ### Bivariate normal cumulative distribution function {-}
diff --git a/src/stan-users-guide/finite-mixtures.qmd b/src/stan-users-guide/finite-mixtures.qmd
@@ -476,7 +476,7 @@ y_n \sim
 \end{cases}
 $$
 
-Stan does not support conditional sampling statements (with `~`) conditional on some parameter, and we need to consider the corresponding likelihood
+Stan does not support conditional distribution statements (with `~`) conditional on some parameter, and we need to consider the corresponding likelihood
 $$
 p(y_n \mid \theta,\lambda)
 =
@@ -485,7 +485,7 @@ p(y_n \mid \theta,\lambda)
 (1-\theta) \times \textsf{Poisson}(y_n \mid \lambda) &\quad\text{if } y_n > 0.
 \end{cases}
 $$
-The log likelihood can be implemented directly in Stan (with `target +=`) as follows.
+The log likelihood can be coded directly in Stan (with `target +=`) as follows.
 
 
 ```stan
diff --git a/src/stan-users-guide/for-bugs-users.qmd b/src/stan-users-guide/for-bugs-users.qmd
@@ -342,15 +342,15 @@ you have a parameter `p` declared as
 ```stan
 real<lower=0, upper=1> p;
 ```
-and then have no sampling statement for `p` in the `model`
+and then have no distribution statement for `p` in the `model`
 block, then you are implicitly assigning a uniform $[0,1]$ prior on
 `p`.
 
 On the other hand, if you have a parameter `theta` declared with
 ```stan
 real theta;
 ```
-and have no sampling statement for `theta` in the `model` block, then
+and have no distribution statement for `theta` in the `model` block, then
 you are implicitly assigning an improper uniform prior on
 $(-\infty,\infty)$ to `theta`.
 
diff --git a/src/stan-users-guide/gaussian-processes.qmd b/src/stan-users-guide/gaussian-processes.qmd
@@ -802,11 +802,11 @@ vector `f`, which consists of the concatenation of the conditional mean
 for known outputs `y1` and unknown outputs `y2`.  Thus the
 combined output vector `f` is aligned with the combined
 input vector `x`.  All that is left is to define the univariate
-normal sampling statement for `y`.
+normal distribution statement for `y`.
 
 The generated quantities block defines the quantity `y2`. We generate
-`y2` by sampling `N2` univariate normals with each mean corresponding
-to the appropriate element in `f`.
+`y2` by randomly generating `N2` values from univariate normals with 
+each mean corresponding to the appropriate element in `f`.
 
 
 #### Predictive inference in non-Gaussian GPs {-}
diff --git a/src/stan-users-guide/measurement-error.qmd b/src/stan-users-guide/measurement-error.qmd
@@ -319,7 +319,7 @@ model {
 }
 ```
 
-The sampling statement for `y` is vectorized; it has the same
+The distribution statement for `y` is vectorized; it has the same
 effect as the following.
 ```stan
   for (j in 1:J) {
@@ -354,8 +354,8 @@ model {
 }
 ```
 
-Although the vectorized sampling statement for `y` appears
-unchanged, the parameter `theta` is now a vector.  The sampling
+Although the vectorized distribution statement for `y` appears
+unchanged, the parameter `theta` is now a vector.  The distribution
 statement for `theta` is also vectorized, with the
 hyperparameters `mu` and `tau` themselves being given wide
 priors compared to the scale of the data.
diff --git a/src/stan-users-guide/missing-data.qmd b/src/stan-users-guide/missing-data.qmd
@@ -255,7 +255,7 @@ for (n in 1:N) {
 ```
 
 It's a bit more work, but much more efficient to vectorize these
-sampling statements.  In transformed data, build up three vectors of
+distribution statements.  In transformed data, build up three vectors of
 indices, for the three cases above:
 
 ```stan
@@ -267,7 +267,7 @@ transformed data {
 ```
 
 You will need to write functions that pull out the count of
-observations in each of the three sampling situations.  This must be
+observations in each of the three situations.  This must be
 done with functions because the result needs to go in top-level block
 variable size declaration.  Then the rest of transformed data just
 fills in the values using three counters.
diff --git a/src/stan-users-guide/multi-indexing.qmd b/src/stan-users-guide/multi-indexing.qmd
@@ -42,7 +42,7 @@ in speed to the clunky assignment to a local variable.
 ```
 
 The boost in speed compared to the original version is because the
-single call to the normal log density in the sampling statement will
+single call to the normal log density in the distribution statement will
 be much more memory efficient than the original version.
 
 
diff --git a/src/stan-users-guide/regression.qmd b/src/stan-users-guide/regression.qmd
@@ -55,7 +55,7 @@ improper priors for the two regression coefficients.
 
 ### Matrix notation and vectorization {- #vectorization.section}
 
-The sampling statement in the previous model is vectorized, with
+The distribution statement in the previous model is vectorized, with
 
 ```stan
 y ~ normal(alpha + beta * x, sigma);
@@ -114,7 +114,7 @@ the entire model may be written using matrix arithmetic as shown.  It
 would be possible to include a column of ones in the data matrix `x` to
 remove the `alpha` parameter.
 
-The sampling statement in the model above is just a more efficient,
+The distribution statement in the model above is just a more efficient,
 vector-based approach to coding the model with a loop, as in the
 following statistically equivalent model.
 
@@ -259,7 +259,7 @@ term $\epsilon$ as having a normal distribution.  From Stan's
 perspective, there is nothing special about normally distributed
 noise.  For instance, robust regression can be accommodated by giving
 the noise term a Student-$t$ distribution.  To code this in Stan, the
-sampling distribution is changed to the following.
+distribution distribution is changed to the following.
 
 
 ```stan
@@ -357,7 +357,7 @@ $$
 
 The cumulative standard normal distribution function $\Phi$ is implemented
 in Stan as the function `Phi`.  The probit regression model
-may be coded in Stan by replacing the logistic model's sampling
+may be coded in Stan by replacing the logistic model's distribution
 statement with the following.
 
 
@@ -806,7 +806,7 @@ recommendations on priors for regression coefficients and scales.
 
 #### Optimizing the model {-}
 
-Where possible, vectorizing sampling statements leads to faster log
+Where possible, vectorizing distribution statements leads to faster log
 probability and derivative evaluations.  The speed boost is not
 because loops are eliminated, but because vectorization allows sharing
 subcomputations in the log probability and gradient calculations and
@@ -847,7 +847,7 @@ Stan because they are translated directly to C++.  In most cases, the
 cost of allocating and assigning to a container is more than made up
 for by the increased efficiency due to vectorizing the log probability
 and gradient calculations.  Thus the following version is faster than
-the original formulation as a loop over a sampling statement.
+the original formulation as a loop over a distribution statement.
 
 
 ```stan
@@ -1062,7 +1062,7 @@ $$
 \textsf{normal}\left(y \mid 0, 1\right).
 $$
 
-The sampling statement is also vectorized using elementwise
+The distribution statement is also vectorized using elementwise
 multiplication;  it is equivalent to
 
 ```stan
@@ -1409,13 +1409,13 @@ with the vectorized form:
 
 The outer brackets create a local scope in which to define the
 variable `x_beta_jj`, which is then filled in a loop and used
-to define a vectorized sampling statement.  The reason this is such a
+to define a vectorized distribution statement.  The reason this is such a
 big win is that it allows us to take the log of sigma only once and it
 greatly reduces the size of the resulting expression graph by packing
 all of the work into a single density function.
 
 Although it is tempting to redeclare `beta` and include a revised
-model block sampling statement,
+model block distribution statement,
 
 ```stan
 parameters {
@@ -1428,7 +1428,7 @@ model {
 }
 ```
 
-this fails because it breaks the vectorization of sampling for
+this fails because it breaks the vectorization for
 `beta`,^[Thanks to Mike Lawrence for pointing this out in the GitHub issue for the manual.]
 
 ```stan
diff --git a/src/stan-users-guide/reparameterization.qmd b/src/stan-users-guide/reparameterization.qmd
@@ -396,7 +396,7 @@ transforms a parameter, then samples it.  Only the latter requires a
 Jacobian adjustment.
 
 It does not matter whether the probability function is
-expressed using a sampling statement, such as
+expressed using a distribution statement, such as
 
 ```stan
 log(y) ~ normal(mu, sigma);
@@ -415,7 +415,7 @@ of variables whose inverse has a gamma distribution.  This section
 contrasts two approaches, first with a transform, then with a change
 of variables.
 
-The transform based approach to sampling `y_inv` with an inverse
+The transform based approach to defining `y_inv` to have an inverse
 gamma distribution can be coded as follows.
 
 ```stan
@@ -431,7 +431,7 @@ model {
 }
 ```
 
-The change-of-variables approach to sampling `y_inv` with an
+The change-of-variables approach to defining `y_inv` to have an
 inverse gamma distribution can be coded as follows.
 
 ```stan
diff --git a/src/stan-users-guide/time-series.qmd b/src/stan-users-guide/time-series.qmd
@@ -256,7 +256,7 @@ scale squared.  Finally, the whole regression is inside the
 variance parameters) for the normal distribution.
 
 With the regression in the transformed parameters block, the model
-reduces a single vectorized sampling statement.  Because `r` and
+reduces a single vectorized distribution statement.  Because `r` and
 `sigma` are of length `T`, all of the data are modeled
 directly.
 
@@ -319,20 +319,21 @@ model {
 
 The error terms $\epsilon_t$ are defined as transformed parameters in
 terms of the observations and parameters.  The definition of the
-sampling statement (defining the likelihood) follows the definition,
+distribution statement (which also defines the likelihood) follows the 
+definition,
 which can only be applied to $y_n$ for $n > Q$.  In this example, the
 parameters are all given Cauchy (half-Cauchy for $\sigma$) priors,
 although other priors can be used just as easily.
 
 This model could be improved in terms of speed by vectorizing the
-sampling statement in the model block.  Vectorizing the calculation of
+distribution statement in the model block.  Vectorizing the calculation of
 the $\epsilon_t$ could also be sped up by using a dot product instead
 of a loop.
 
 
 ### Vectorized MA(Q) model {-}
 
-A general $\mbox{MA}(Q)$ model with a vectorized sampling probability
+A general $\mbox{MA}(Q)$ model with a vectorized distribution statement
 may be defined as follows.
 
 ```stan
@@ -510,12 +511,12 @@ h_1        &\sim \textsf{normal}\left( \mu, \frac{\sigma}{\sqrt{1 - \phi^2}} \ri
 \end{align*}
 
 Rearranging the first line, $\epsilon_t = y_t \exp(-h_t / 2)$,
-allowing the sampling distribution for $y_t$ to be written as
+allowing the distribution for $y_t$ to be written as
 $$
 y_t \sim \textsf{normal}(0,\exp(h_t/2)).
 $$
 The recurrence equation for $h_{t+1}$ may be combined with the
-scaling and sampling of $\delta_t$ to yield the sampling distribution
+scaling of $\delta_t$ to yield the distribution
 $$
 h_t \sim \mathsf{normal}(\mu + \phi(h_{t-1} - \mu), \sigma).
 $$
@@ -569,7 +570,7 @@ diagonal mass matrix, but will not scale to large values of $T$.
 
 It is relatively straightforward to speed up the effective samples per
 second generated by this model by one or more orders of magnitude.
-First, the sampling statements for return $y$ is easily vectorized to
+First, the distribution statements for return $y$ is easily vectorized to
 
 ```stan
 y ~ normal(0, exp(h / 2));
@@ -613,9 +614,9 @@ final loop adds in the moving average so that `h[2]` through
 `h[T]` are appropriately modeled relative to `phi` and
 `mu`.
 
-As a final improvement, the sampling statement for `h[1]` and
-loop for sampling `h[2]` to `h[T]` are replaced with a
-single vectorized standard normal sampling statement.
+As a final improvement, the distribution statements for `h[1]` to
+`h[T]` are replaced with a
+single vectorized standard normal distribution statement.
 
 ```stan
 model {
diff --git a/src/stan-users-guide/truncation-censoring.qmd b/src/stan-users-guide/truncation-censoring.qmd
@@ -49,7 +49,7 @@ when the data are loaded into the model before sampling begins.
 
 This model implicitly uses an improper flat prior on the scale and
 location parameters; these could be given priors in the model using
-sampling statements.
+distribution statements.
 
 ### Constraints and out-of-bounds returns {-}
 
@@ -63,7 +63,7 @@ y ~ normal(mu, sigma) T[L, U];
 ```
 
 then if any value inside `y` is less than the value of `L`
-or greater than the value of `U`, the sampling statement produces
+or greater than the value of `U`, the distribution statement produces
 a zero-probability estimate.  For user-defined truncation, this
 zeroing outside of truncation bounds must be handled explicitly.
 
@@ -181,7 +181,7 @@ and `sigma`.  Because the censored data array `y_cens` is
 declared to have values of type `real<lower=U>`, all imputed values
 for censored data will be greater than `U`.  The imputed censored
 data affects the location and scale parameters through the last
-sampling statement in the model.
+distribution statement in the model.
 
 ### Integrating out censored values {-}
 
@@ -228,7 +228,7 @@ model {
 }
 ```
 
-For the observed values in `y_obs`, the normal sampling model is
+For the observed values in `y_obs`, the normal model is
 used without truncation.  The log probability is directly incremented
 using the calculated log cumulative normal probability of the censored
 data items.
diff --git a/src/stan-users-guide/user-functions.qmd b/src/stan-users-guide/user-functions.qmd
@@ -399,8 +399,8 @@ target += foo_lpdf(y | theta1, ..., thetaN);
 ```
 
 As with the built-in functions, the suffix `_lpdf` is dropped and
-the first argument moves to the left of the sampling symbol (`~`)
-in the sampling statement.
+the first argument moves to the left of the tilde symbol (`~`)
+in the distribution statement.
 
 Functions ending in `_lpmf` (for probability mass functions),
 behave exactly the same way.  The difference is that the first
@@ -533,7 +533,7 @@ Doxygen, so this just looks like a big comment starting with `/*`
 and ending with `*/` to the Stan parser.
 
 For functions that raise exceptions, exceptions can be documented using
-`@throws`.^[As of Stan 2.9.0, the only way a user-defined producer will raise an exception is if a function it calls (including sampling statements) raises an exception via the reject statement.]
+`@throws`.^[As of Stan 2.9.0, the only way a user-defined producer will raise an exception is if a function it calls (including distribution statements) raises an exception via the reject statement.]
 
 For example,
 
@@ -571,10 +571,10 @@ arguments of a type that matches the declared return type.
 
 Only functions ending in `_lpmf` or `_lpdf` and with
 return type `real` may be used as probability functions in
-sampling statements.
+distribution statements.
 
 Only functions ending in `_lp` may access the log probability
-accumulator through sampling statements or `target +=`
+accumulator through distribution statements or `target +=`
 statements.  Such functions may only be used in the transformed
 parameters or model blocks.
 
diff --git a/src/stan-users-guide/using-stanc.qmd b/src/stan-users-guide/using-stanc.qmd
@@ -644,7 +644,7 @@ Pedantic mode produces the following warning.
 
 ```
 Warning:
-    Left-hand side of sampling statement (~) may contain a non-linear
+    Left-hand side of distribution statement (~) may contain a non-linear
     transform of a parameter or local variable. If it does, you need
     to include a target += statement with the log absolute determinant
     of the Jacobian of the transform.