|
1 | 1 | # Document Author
|
2 | 2 | # Yuta Nakahara <yuta.nakahara@aoni.waseda.jp>
|
| 3 | +# Koki Kazama <kokikazama@aoni.waseda.jp> |
3 | 4 | r"""
|
4 | 5 | The linear autoregressive model with the normal-gamma prior distribution.
|
5 | 6 |
|
|
12 | 13 | * :math:`\boldsymbol{\theta} \in \mathbb{R}^{d+1}`: a regression coefficient parameter
|
13 | 14 | * :math:`\tau \in \mathbb{R}_{>0}`: a precision parameter of noise
|
14 | 15 |
|
15 |
| -.. math:: |
16 |
| - \mathcal{N}(x_n|\boldsymbol{\theta}^\top \boldsymbol{x}'_{n-1}, \tau^{-1}) |
17 |
| - = \sqrt{\frac{\tau}{2 \pi}} \exp \left\{ -\frac{\tau}{2} (x_n - \boldsymbol{\theta}^\top \boldsymbol{x}'_{n-1})^2 \right\}. |
| 16 | +.. math:: |
| 17 | + p(x_n | \boldsymbol{x}'_{n-1}, \boldsymbol{\theta}, \tau) &= \mathcal{N}(x_n|\boldsymbol{\theta}^\top \boldsymbol{x}'_{n-1}, \tau^{-1}) \\ |
| 18 | + &= \sqrt{\frac{\tau}{2 \pi}} \exp \left\{ -\frac{\tau}{2} (x_n - \boldsymbol{\theta}^\top \boldsymbol{x}'_{n-1})^2 \right\}. |
| 19 | +
|
| 20 | +.. math:: |
| 21 | + &\mathbb{E}[ x_n | \boldsymbol{x}'_{n-1},\boldsymbol{\theta},\tau] = \boldsymbol{\theta}^{\top} \boldsymbol{x}'_{n-1}, \\ |
| 22 | + &\mathbb{V}[ x_n | \boldsymbol{x}'_{n-1},\boldsymbol{\theta},\tau ] = \tau^{-1}. |
| 23 | +
|
18 | 24 |
|
19 | 25 | The prior distribution is as follows:
|
20 | 26 |
|
21 | 27 | * :math:`\boldsymbol{\mu}_0 \in \mathbb{R}^{d+1}`: a hyperparameter for :math:`\boldsymbol{\theta}`
|
22 | 28 | * :math:`\boldsymbol{\Lambda}_0 \in \mathbb{R}^{(d+1) \times (d+1)}`: a hyperparameter for :math:`\boldsymbol{\theta}` (a positive definite matrix)
|
23 | 29 | * :math:`| \boldsymbol{\Lambda}_0 | \in \mathbb{R}`: the determinant of :math:`\boldsymbol{\Lambda}_0`
|
24 |
| -* :math:`a_0 \in \mathbb{R}_{>0}`: a hyperparameter for :math:`\tau` |
25 |
| -* :math:`b_0 \in \mathbb{R}_{>0}`: a hyperparameter for :math:`\tau` |
| 30 | +* :math:`\alpha_0 \in \mathbb{R}_{>0}`: a hyperparameter for :math:`\tau` |
| 31 | +* :math:`\beta_0 \in \mathbb{R}_{>0}`: a hyperparameter for :math:`\tau` |
26 | 32 | * :math:`\Gamma(\cdot): \mathbb{R}_{>0} \to \mathbb{R}`: the Gamma function
|
27 | 33 |
|
28 | 34 | .. math::
|
29 |
| - &\mathcal{N}(\boldsymbol{\theta}|\boldsymbol{\mu}_0, (\tau \boldsymbol{\Lambda}_0)^{-1}) \text{Gam}(\tau|a_0,b_0)\\ |
| 35 | + p(\boldsymbol{\theta}, \tau) &= \mathcal{N}(\boldsymbol{\theta}|\boldsymbol{\mu}_0, (\tau \boldsymbol{\Lambda}_0)^{-1}) \mathrm{Gam}(\tau|\alpha_0,\beta_0)\\ |
30 | 36 | &= \frac{|\tau \boldsymbol{\Lambda}_0|^{1/2}}{(2 \pi)^{(d+1)/2}}
|
31 | 37 | \exp \left\{ -\frac{\tau}{2} (\boldsymbol{\theta} - \boldsymbol{\mu}_0)^\top
|
32 | 38 | \boldsymbol{\Lambda}_0 (\boldsymbol{\theta} - \boldsymbol{\mu}_0) \right\}
|
33 |
| - \frac{b_0^{a_0}}{\Gamma (a_0)} \tau^{a_0 - 1} \exp \{ -b_0 \tau \} . |
| 39 | + \frac{\beta_0^{\alpha_0}}{\Gamma (\alpha_0)} \tau^{\alpha_0 - 1} \exp \{ -\beta_0 \tau \} . |
| 40 | +
|
| 41 | +.. math:: |
| 42 | + \mathbb{E}[\boldsymbol{\theta}] &= \boldsymbol{\mu}_0 & \left( \alpha_0 > \frac{1}{2} \right), \\ |
| 43 | + \mathrm{Cov}[\boldsymbol{\theta}] &= \frac{\beta_0}{\alpha_0 - 1} \boldsymbol{\Lambda}_0^{-1} & (\alpha_0 > 1), \\ |
| 44 | + \mathbb{E}[\tau] &= \frac{\alpha_0}{\beta_0}, \\ |
| 45 | + \mathbb{V}[\tau] &= \frac{\alpha_0}{\beta_0^2}. |
34 | 46 |
|
35 | 47 | The posterior distribution is as follows:
|
36 | 48 |
|
37 | 49 | * :math:`x^n := [x_1, x_2, \dots , x_n]^\top \in \mathbb{R}^n`: given data
|
38 | 50 | * :math:`\boldsymbol{X}_n = [\boldsymbol{x}'_1, \boldsymbol{x}'_2, \dots , \boldsymbol{x}'_n]^\top \in \mathbb{R}^{n \times (d+1)}`
|
39 | 51 | * :math:`\boldsymbol{\mu}_n \in \mathbb{R}^{d+1}`: a hyperparameter for :math:`\boldsymbol{\theta}`
|
40 | 52 | * :math:`\boldsymbol{\Lambda}_n \in \mathbb{R}^{(d+1) \times (d+1)}`: a hyperparameter for :math:`\boldsymbol{\theta}` (a positive definite matrix)
|
41 |
| -* :math:`a_n \in \mathbb{R}_{>0}`: a hyperparameter for :math:`\tau` |
42 |
| -* :math:`b_n \in \mathbb{R}_{>0}`: a hyperparameter for :math:`\tau` |
| 53 | +* :math:`\alpha_n \in \mathbb{R}_{>0}`: a hyperparameter for :math:`\tau` |
| 54 | +* :math:`\beta_n \in \mathbb{R}_{>0}`: a hyperparameter for :math:`\tau` |
43 | 55 |
|
44 | 56 | .. math::
|
45 |
| - &\mathcal{N}(\boldsymbol{\theta}|\boldsymbol{\mu}_n, (\tau \boldsymbol{\Lambda}_n)^{-1}) \text{Gam}(\tau|a_n,b_n)\\ |
| 57 | + p(\boldsymbol{\theta}, \tau | x^n) &= \mathcal{N}(\boldsymbol{\theta}|\boldsymbol{\mu}_n, (\tau \boldsymbol{\Lambda}_n)^{-1}) \mathrm{Gam}(\tau|\alpha_n,\beta_n)\\ |
46 | 58 | &= \frac{|\boldsymbol{\tau \Lambda}_n|^{1/2}}{(2 \pi)^{(d+1)/2}}
|
47 | 59 | \exp \left\{ -\frac{\tau}{2} (\boldsymbol{\theta} - \boldsymbol{\mu}_n)^\top
|
48 | 60 | \boldsymbol{\Lambda}_n (\boldsymbol{\theta} - \boldsymbol{\mu}_n) \right\}
|
49 |
| - \frac{b_n^{a_n}}{\Gamma (a_n)} \tau^{a_n - 1} \exp \{ -b_n \tau \} . |
| 61 | + \frac{\beta_n^{\alpha_n}}{\Gamma (\alpha_n)} \tau^{\alpha_n - 1} \exp \{ -\beta_n \tau \} . |
| 62 | +
|
| 63 | +.. math:: |
| 64 | + \mathbb{E}[\boldsymbol{\theta} | x^n] &= \boldsymbol{\mu}_n & \left( \alpha_n > \frac{1}{2} \right), \\ |
| 65 | + \mathrm{Cov}[\boldsymbol{\theta} | x^n] &= \frac{\beta_n}{\alpha_n - 1} \boldsymbol{\Lambda}_n^{-1} & (\alpha_n > 1), \\ |
| 66 | + \mathbb{E}[\tau | x^n] &= \frac{\alpha_n}{\beta_n}, \\ |
| 67 | + \mathbb{V}[\tau | x^n] &= \frac{\alpha_n}{\beta_n^2}, |
50 | 68 |
|
51 | 69 | where the updating rules of the hyperparameters are
|
52 | 70 |
|
53 | 71 | .. math::
|
54 | 72 | \boldsymbol{\Lambda}_n &= \boldsymbol{\Lambda}_0 + \boldsymbol{X}_n^\top \boldsymbol{X}_n,\\
|
55 | 73 | \boldsymbol{\mu}_n &= \boldsymbol{\Lambda}_n^{-1} (\boldsymbol{\Lambda}_0 \boldsymbol{\mu}_0 + \boldsymbol{X}_n^\top x^n),\\
|
56 |
| - a_n &= a_0 + \frac{n}{2},\\ |
57 |
| - b_n &= b_0 + \frac{1}{2} \left( -\boldsymbol{\mu}_n^\top \boldsymbol{\Lambda}_n \boldsymbol{\mu}_n |
| 74 | + \alpha_n &= \alpha_0 + \frac{n}{2},\\ |
| 75 | + \beta_n &= \beta_0 + \frac{1}{2} \left( -\boldsymbol{\mu}_n^\top \boldsymbol{\Lambda}_n \boldsymbol{\mu}_n |
58 | 76 | + (x^n)^\top x^n + \boldsymbol{\mu}_0^\top \boldsymbol{\Lambda}_0 \boldsymbol{\mu}_0 \right).
|
59 | 77 |
|
60 | 78 | The predictive distribution is as follows:
|
|
65 | 83 | * :math:`\nu_\mathrm{p} \in \mathbb{R}_{>0}`: a parameter
|
66 | 84 |
|
67 | 85 | .. math::
|
68 |
| - \text{St}(x_{n+1}|m_\mathrm{p}, \lambda_\mathrm{p}, \nu_\mathrm{p}) |
| 86 | + \mathrm{St}(x_{n+1}|m_\mathrm{p}, \lambda_\mathrm{p}, \nu_\mathrm{p}) |
69 | 87 | = \frac{\Gamma (\nu_\mathrm{p}/2 + 1/2)}{\Gamma (\nu_\mathrm{p}/2)}
|
70 | 88 | \left( \frac{m_\mathrm{p}}{\pi \nu_\mathrm{p}} \right)^{1/2}
|
71 | 89 | \left[ 1 + \frac{\lambda_\mathrm{p}(x_{n+1}-m_\mathrm{p})^2}{\nu_\mathrm{p}} \right]^{-\nu_\mathrm{p}/2 - 1/2}.
|
72 | 90 |
|
| 91 | +.. math:: |
| 92 | + \mathbb{E}[x_{n+1} | x^n] &= m_\mathrm{p} & (\nu_\mathrm{p} > 1), \\ |
| 93 | + \mathbb{V}[x_{n+1} | x^n] &= \frac{1}{\lambda_\mathrm{p}} \frac{\nu_\mathrm{p}}{\nu_\mathrm{p}-2} & (\nu_\mathrm{p} > 2), |
| 94 | +
|
73 | 95 | where the parameters are obtained from the hyperparameters of the posterior distribution as follows.
|
74 | 96 |
|
75 | 97 | .. math::
|
76 | 98 | m_\mathrm{p} &= \mu_n^\top \boldsymbol{x}'_n,\\
|
77 |
| - \lambda_\mathrm{p} &= \frac{a_n}{b_n} (1 + (\boldsymbol{x}'_n)^\top \boldsymbol{\Lambda}_n^{-1} \boldsymbol{x}'_n)^{-1},\\ |
78 |
| - \nu_\mathrm{p} &= 2 a_n. |
| 99 | + \lambda_\mathrm{p} &= \frac{\alpha_n}{\beta_n} (1 + (\boldsymbol{x}'_n)^\top \boldsymbol{\Lambda}_n^{-1} \boldsymbol{x}'_n)^{-1},\\ |
| 100 | + \nu_\mathrm{p} &= 2 \alpha_n. |
79 | 101 | """
|
80 | 102 |
|
81 | 103 | from ._autoregressive import GenModel
|
|
0 commit comments