|
| 1 | +# Document Author |
| 2 | +# Koshi Shimada <shimada.koshi.re@gmail.com> |
| 3 | +r""" |
| 4 | +The stochastic data generative model is as follows: |
| 5 | +
|
| 6 | +* :math:`\mathcal{X}=\{1,2,\ldots,K\}` : a space of a source symbol |
| 7 | +* :math:`x^n = x_1 x_2 \cdots x_n \in \mathcal{X}^n~(n\in\mathbb{N})` : an source sequence |
| 8 | +* :math:`D_\mathrm{max} \in \mathbb{N}` : the maximum depth of context tree models |
| 9 | +* :math:`T` : a context tree model, :math:`K`-ary regular tree whose depth is smaller than or equal to :math:`D_\mathrm{max}`, where "regular" means that all inner nodes have :math:`K` child nodes |
| 10 | +* :math:`\mathcal{T}` : a set of :math:`T` |
| 11 | +* :math:`s` : a node of a context tree model |
| 12 | +* :math:`\mathcal{I}(T)` : a set of inner nodes of :math:`T` |
| 13 | +* :math:`\mathcal{L}(T)` : a set of leaf nodes of :math:`T` |
| 14 | +* :math:`\mathcal{S}(T)` : a set of all nodes of :math:`T`, i.e., :math:`\mathcal{S}(T) = \mathcal{I}(T) \cup \mathcal{L}(T)` |
| 15 | +* :math:`s_T(x^{n-1}) \in \mathcal{L}(T)` : a leaf node of :math:`T` corresponding to :math:`x^{n-1} = x_1 x_2\cdots x_{n-1}` |
| 16 | +* :math:`\boldsymbol{\theta}_s = (\theta_{1|s}, \theta_{2|s}, \ldots, \theta_{K|s})` : a parameter on a leaf node, where :math:`\theta_{k|s}` denotes the occurrence probability of :math:`k\in\mathcal{X}` |
| 17 | +
|
| 18 | +.. math:: |
| 19 | + p(x_n | x^{n-1}, \boldsymbol{\theta}_T, T)=\theta_{x_n|s_T(x^{n-1})}. |
| 20 | +
|
| 21 | +The prior distribution is as follows: |
| 22 | +
|
| 23 | +* :math:`g_{0,s} \in [0,1]` : a hyperparameter assigned to :math:`s \in \mathcal{S}(T)` |
| 24 | +* :math:`\beta_0(k|s) \in\mathbb{R}_{>0}` : a hyperparameter of the Dirichlet distribution |
| 25 | +* :math:`\boldsymbol{\beta}_0(s) = (\beta_0(1|s), \beta_0(2|s), \ldots, \beta_0(K|s)) \in\mathbb{R}^{K}_{>0}` |
| 26 | +* :math:`C(\boldsymbol{\beta}_0(s)) = \frac{\Gamma\left(\sum_{k=1}^{K} \beta_0(k|s)\right)}{\prod_{k=1}^{K} \Gamma\left(\beta_0(k|s)\right)}` |
| 27 | +
|
| 28 | +For :math:`\boldsymbol{\theta}_s` on :math:`s\in\mathcal{L}(T)`, the Dirichlet distribution is assumed as the prior distribution as follows: |
| 29 | +
|
| 30 | +.. math:: |
| 31 | + p(\boldsymbol{\theta}_s|T) = \mathrm{Dir}(\boldsymbol{\theta}_s|\,\boldsymbol{\beta}_0(s)) = C(\boldsymbol{\beta}_0(s)) \prod_{k=1}^{K} \theta_{k|s}^{\beta_0(k|s)-1}. |
| 32 | +
|
| 33 | +For :math:`T \in \mathcal{T}`, |
| 34 | +
|
| 35 | +.. math:: |
| 36 | + p(T)=\prod_{s \in \mathcal{I}(T)} g_{0,s} \prod_{s' \in \mathcal{L}(T)} (1-g_{0,s'}), |
| 37 | +
|
| 38 | +where :math:`g_{0,s}=0` if the depth of :math:`s` is :math:`D_\mathrm{max}`. |
| 39 | +
|
| 40 | +The posterior distribution is as follows: |
| 41 | +
|
| 42 | +* :math:`g_{n,s} \in [0,1]` : the updated hyperparameter |
| 43 | +* :math:`T_\mathrm{max}` : a superposed context tree, :math:`K`-ary perfect tree whose depth is :math:`D_\mathrm{max}` |
| 44 | +* :math:`s_\lambda` : the root node |
| 45 | +* :math:`\beta_n(k|s) \in\mathbb{R}_{>0}` : a hyperparameter of the posterior Dirichlet distribution |
| 46 | +* :math:`\boldsymbol{\beta}_n(s) = (\beta_n(1|s), \beta_n(2|s), \ldots, \beta_n(K|s)) \in\mathbb{R}^{K}_{>0}` |
| 47 | +* :math:`I \{ \cdot \}`: the indicator function |
| 48 | +
|
| 49 | +For :math:`\boldsymbol{\theta}_s \in\mathcal{L}(T_\mathrm{max})`, |
| 50 | +
|
| 51 | +.. math:: |
| 52 | + p(\boldsymbol{\theta}_s|x^n) = \mathrm{Dir}(\boldsymbol{\theta}_s|\,\boldsymbol{\beta}_n(s)) = C(\boldsymbol{\beta}_n(s)) \prod_{k=1}^{K} \theta_{k|s}^{\beta_n(k|s)-1}, |
| 53 | +
|
| 54 | +where the updating rule of the hyperparameter is as follows: |
| 55 | +
|
| 56 | +.. math:: |
| 57 | + \beta_n(k|s) = \beta_0(k|s) + \sum_{i=1}^n I \left\{ \text{:math:`s` is the ancestor of :math:`s_{T_\mathrm{max}}(x^{i-1})` and :math:`x_i=k` } \right\}. |
| 58 | +
|
| 59 | +For :math:`T \in \mathcal{T}`, |
| 60 | +
|
| 61 | +.. math:: |
| 62 | + p(T|x^{n-1})=\prod_{s \in \mathcal{I}(T)} g_{n,s} \prod_{s' \in \mathcal{L}(T)} (1-g_{n,s'}), |
| 63 | +
|
| 64 | +where the updating rules of the hyperparameter are as follows: |
| 65 | +
|
| 66 | +.. math:: |
| 67 | + g_{n,s} = |
| 68 | + \begin{cases} |
| 69 | + g_{0,s} & \text{if :math:`n=0`}, \\ |
| 70 | + \frac{ g_{n-1,s} \tilde{q}_{s_{\mathrm{child}}} (x_n|x^{n-1}) } |
| 71 | + { \tilde{q}_s(x_n|x^{n-1}) } & \text{otherwise}, |
| 72 | + \end{cases} |
| 73 | +
|
| 74 | +where :math:`s_{\mathrm{child}}` is the child node of :math:`s` on the path from :math:`s_\lambda` to :math:`s_{T_\mathrm{max}}(x^n)` and |
| 75 | +
|
| 76 | +.. math:: |
| 77 | + \tilde{q}_s(x_n|x^{n-1}) = |
| 78 | + \begin{cases} |
| 79 | + q_s(x_n|x^{n-1}) & \text{if :math:`s\in\mathcal{L}(T_\mathrm{max})`}, \\ |
| 80 | + (1-g_{n-1,s}) q_s(x_n|x^{n-1}) + g_{n-1,s} \tilde{q}_{s_{\mathrm{child}}}(x_n|x^{n-1}) & \text{otherwise}. |
| 81 | + \end{cases} |
| 82 | +
|
| 83 | +Here, |
| 84 | +
|
| 85 | +.. math:: |
| 86 | + q_s(x_n|x^{n-1}) = \frac{ \beta_{n-1}(x_n|s) } |
| 87 | + {\sum_{k'=1}^{K} \beta_{n-1}(k'|s)}. |
| 88 | +
|
| 89 | +The predictive distribution is as follows: |
| 90 | +
|
| 91 | +* :math:`\boldsymbol{\theta}_\mathrm{p} = (\theta_{\mathrm{p},1}, \theta_{\mathrm{p},2}, \ldots, \theta_{\mathrm{p},K})` : a parameter of the predictive distribution, where :math:`\theta_{\mathrm{p},k}` denotes the occurrence probability of :math:`k\in\mathcal{X}`. |
| 92 | +
|
| 93 | +.. math:: |
| 94 | + p(x_n|x^{n-1}) = \theta_{\mathrm{p},x_n}, |
| 95 | +
|
| 96 | +where the updating rule of the parameters of the pridictive distribution is as follows. |
| 97 | +
|
| 98 | +.. math:: |
| 99 | + \theta_{\mathrm{p}, k} = \tilde{q}_{s_\lambda}(k|x^{n-1}) |
| 100 | +
|
| 101 | +References |
| 102 | +
|
| 103 | +* Matsushima, T.; and Hirasawa, S. Reducing the space complexity of a Bayes coding algorithm using an expanded context tree, *2009 IEEE International Symposium on Information Theory*, 2009, pp. 719-723, https://doi.org/10.1109/ISIT.2009.5205677 |
| 104 | +* Nakahara, Y.; Saito, S.; Kamatsuka, A.; Matsushima, T. Probability Distribution on Full Rooted Trees. *Entropy* 2022, 24, 328. https://doi.org/10.3390/e24030328 |
| 105 | +""" |
1 | 106 | from ._contexttree import GenModel
|
2 | 107 | from ._contexttree import LearnModel
|
3 | 108 |
|
|
0 commit comments