Skip to content

Commit cb49b4c

Browse files
Merge pull request #46 from yuta-nakahara/develop-contexttree
Develop contexttree
2 parents 2c8d809 + 5e55d55 commit cb49b4c

File tree

5 files changed

+1264
-0
lines changed

5 files changed

+1264
-0
lines changed

bayesml/contexttree/__init__.py

Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
# Document Author
2+
# Koshi Shimada <shimada.koshi.re@gmail.com>
3+
r"""
4+
The stochastic data generative model is as follows:
5+
6+
* :math:`\mathcal{X}=\{1,2,\ldots,K\}` : a space of a source symbol
7+
* :math:`x^n = x_1 x_2 \cdots x_n \in \mathcal{X}^n~(n\in\mathbb{N})` : an source sequence
8+
* :math:`D_\mathrm{max} \in \mathbb{N}` : the maximum depth of context tree models
9+
* :math:`T` : a context tree model, :math:`K`-ary regular tree whose depth is smaller than or equal to :math:`D_\mathrm{max}`, where "regular" means that all inner nodes have :math:`K` child nodes
10+
* :math:`\mathcal{T}` : a set of :math:`T`
11+
* :math:`s` : a node of a context tree model
12+
* :math:`\mathcal{I}(T)` : a set of inner nodes of :math:`T`
13+
* :math:`\mathcal{L}(T)` : a set of leaf nodes of :math:`T`
14+
* :math:`\mathcal{S}(T)` : a set of all nodes of :math:`T`, i.e., :math:`\mathcal{S}(T) = \mathcal{I}(T) \cup \mathcal{L}(T)`
15+
* :math:`s_T(x^{n-1}) \in \mathcal{L}(T)` : a leaf node of :math:`T` corresponding to :math:`x^{n-1} = x_1 x_2\cdots x_{n-1}`
16+
* :math:`\boldsymbol{\theta}_s = (\theta_{1|s}, \theta_{2|s}, \ldots, \theta_{K|s})` : a parameter on a leaf node, where :math:`\theta_{k|s}` denotes the occurrence probability of :math:`k\in\mathcal{X}`
17+
18+
.. math::
19+
p(x_n | x^{n-1}, \boldsymbol{\theta}_T, T)=\theta_{x_n|s_T(x^{n-1})}.
20+
21+
The prior distribution is as follows:
22+
23+
* :math:`g_{0,s} \in [0,1]` : a hyperparameter assigned to :math:`s \in \mathcal{S}(T)`
24+
* :math:`\beta_0(k|s) \in\mathbb{R}_{>0}` : a hyperparameter of the Dirichlet distribution
25+
* :math:`\boldsymbol{\beta}_0(s) = (\beta_0(1|s), \beta_0(2|s), \ldots, \beta_0(K|s)) \in\mathbb{R}^{K}_{>0}`
26+
* :math:`C(\boldsymbol{\beta}_0(s)) = \frac{\Gamma\left(\sum_{k=1}^{K} \beta_0(k|s)\right)}{\prod_{k=1}^{K} \Gamma\left(\beta_0(k|s)\right)}`
27+
28+
For :math:`\boldsymbol{\theta}_s` on :math:`s\in\mathcal{L}(T)`, the Dirichlet distribution is assumed as the prior distribution as follows:
29+
30+
.. math::
31+
p(\boldsymbol{\theta}_s|T) = \mathrm{Dir}(\boldsymbol{\theta}_s|\,\boldsymbol{\beta}_0(s)) = C(\boldsymbol{\beta}_0(s)) \prod_{k=1}^{K} \theta_{k|s}^{\beta_0(k|s)-1}.
32+
33+
For :math:`T \in \mathcal{T}`,
34+
35+
.. math::
36+
p(T)=\prod_{s \in \mathcal{I}(T)} g_{0,s} \prod_{s' \in \mathcal{L}(T)} (1-g_{0,s'}),
37+
38+
where :math:`g_{0,s}=0` if the depth of :math:`s` is :math:`D_\mathrm{max}`.
39+
40+
The posterior distribution is as follows:
41+
42+
* :math:`g_{n,s} \in [0,1]` : the updated hyperparameter
43+
* :math:`T_\mathrm{max}` : a superposed context tree, :math:`K`-ary perfect tree whose depth is :math:`D_\mathrm{max}`
44+
* :math:`s_\lambda` : the root node
45+
* :math:`\beta_n(k|s) \in\mathbb{R}_{>0}` : a hyperparameter of the posterior Dirichlet distribution
46+
* :math:`\boldsymbol{\beta}_n(s) = (\beta_n(1|s), \beta_n(2|s), \ldots, \beta_n(K|s)) \in\mathbb{R}^{K}_{>0}`
47+
* :math:`I \{ \cdot \}`: the indicator function
48+
49+
For :math:`\boldsymbol{\theta}_s \in\mathcal{L}(T_\mathrm{max})`,
50+
51+
.. math::
52+
p(\boldsymbol{\theta}_s|x^n) = \mathrm{Dir}(\boldsymbol{\theta}_s|\,\boldsymbol{\beta}_n(s)) = C(\boldsymbol{\beta}_n(s)) \prod_{k=1}^{K} \theta_{k|s}^{\beta_n(k|s)-1},
53+
54+
where the updating rule of the hyperparameter is as follows:
55+
56+
.. math::
57+
\beta_n(k|s) = \beta_0(k|s) + \sum_{i=1}^n I \left\{ \text{:math:`s` is the ancestor of :math:`s_{T_\mathrm{max}}(x^{i-1})` and :math:`x_i=k` } \right\}.
58+
59+
For :math:`T \in \mathcal{T}`,
60+
61+
.. math::
62+
p(T|x^{n-1})=\prod_{s \in \mathcal{I}(T)} g_{n,s} \prod_{s' \in \mathcal{L}(T)} (1-g_{n,s'}),
63+
64+
where the updating rules of the hyperparameter are as follows:
65+
66+
.. math::
67+
g_{n,s} =
68+
\begin{cases}
69+
g_{0,s} & \text{if :math:`n=0`}, \\
70+
\frac{ g_{n-1,s} \tilde{q}_{s_{\mathrm{child}}} (x_n|x^{n-1}) }
71+
{ \tilde{q}_s(x_n|x^{n-1}) } & \text{otherwise},
72+
\end{cases}
73+
74+
where :math:`s_{\mathrm{child}}` is the child node of :math:`s` on the path from :math:`s_\lambda` to :math:`s_{T_\mathrm{max}}(x^n)` and
75+
76+
.. math::
77+
\tilde{q}_s(x_n|x^{n-1}) =
78+
\begin{cases}
79+
q_s(x_n|x^{n-1}) & \text{if :math:`s\in\mathcal{L}(T_\mathrm{max})`}, \\
80+
(1-g_{n-1,s}) q_s(x_n|x^{n-1}) + g_{n-1,s} \tilde{q}_{s_{\mathrm{child}}}(x_n|x^{n-1}) & \text{otherwise}.
81+
\end{cases}
82+
83+
Here,
84+
85+
.. math::
86+
q_s(x_n|x^{n-1}) = \frac{ \beta_{n-1}(x_n|s) }
87+
{\sum_{k'=1}^{K} \beta_{n-1}(k'|s)}.
88+
89+
The predictive distribution is as follows:
90+
91+
* :math:`\boldsymbol{\theta}_\mathrm{p} = (\theta_{\mathrm{p},1}, \theta_{\mathrm{p},2}, \ldots, \theta_{\mathrm{p},K})` : a parameter of the predictive distribution, where :math:`\theta_{\mathrm{p},k}` denotes the occurrence probability of :math:`k\in\mathcal{X}`.
92+
93+
.. math::
94+
p(x_n|x^{n-1}) = \theta_{\mathrm{p},x_n},
95+
96+
where the updating rule of the parameters of the pridictive distribution is as follows.
97+
98+
.. math::
99+
\theta_{\mathrm{p}, k} = \tilde{q}_{s_\lambda}(k|x^{n-1})
100+
101+
References
102+
103+
* Matsushima, T.; and Hirasawa, S. Reducing the space complexity of a Bayes coding algorithm using an expanded context tree, *2009 IEEE International Symposium on Information Theory*, 2009, pp. 719-723, https://doi.org/10.1109/ISIT.2009.5205677
104+
* Nakahara, Y.; Saito, S.; Kamatsuka, A.; Matsushima, T. Probability Distribution on Full Rooted Trees. *Entropy* 2022, 24, 328. https://doi.org/10.3390/e24030328
105+
"""
106+
from ._contexttree import GenModel
107+
from ._contexttree import LearnModel
108+
109+
__all__ = ["GenModel", "LearnModel"]

0 commit comments

Comments
 (0)