Skip to content

Commit 0109468

Browse files
Merge pull request #40 from yuta-nakahara/develop-contexttree-resume
Develop contexttree resume
2 parents 0ff39a8 + 8e1f232 commit 0109468

File tree

1 file changed

+118
-0
lines changed

1 file changed

+118
-0
lines changed

bayesml/contexttree/contexttree.md

Lines changed: 118 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,118 @@
1+
<!--
2+
Document Author
3+
Koshi Shimada <shimada.koshi.re@gmail.com>
4+
-->
5+
6+
The stochastic data generative model is as follows:
7+
8+
* $\mathcal{X}=\{1,2,\ldots,K\}$ : a space of a source symbol
9+
* $x^n = x_1 x_2 \cdots x_n \in \mathcal{X}^n~(n\in\mathbb{N})$ : an source sequence
10+
* $D_\mathrm{max} \in \mathbb{N}$ : the maximum depth of context tree models
11+
* $T$ : a context tree model, $K$-ary regular tree whose depth is smaller than or equal to $D_\mathrm{max}$, where "regular" means that all inner nodes have $K$ child nodes
12+
* $\mathcal{T}$ : a set of $T$
13+
* $s$ : a node of a context tree model
14+
* $\mathcal{I}(T)$ : a set of inner nodes of $T$
15+
* $\mathcal{L}(T)$ : a set of leaf nodes of $T$
16+
* $\mathcal{S}(T)$ : a set of all nodes of $T$, i.e., $\mathcal{S}(T) = \mathcal{I}(T) \cup \mathcal{L}(T)$
17+
* $s_T(x^{n-1}) \in \mathcal{L}(T)$ : a leaf node of $T$ corresponding to $x^{n-1} = x_1 x_2\cdots x_{n-1}$
18+
* $\boldsymbol{\theta}_s = (\theta_{1|s}, \theta_{2|s}, \ldots, \theta_{K|s})$ : a parameter on a leaf node, where $\theta_{k|s}$ denotes the occurrence probability of $k\in\mathcal{X}$
19+
20+
$$
21+
\begin{align}
22+
p(x_n | x^{n-1}, \boldsymbol{\theta}_T, T)=\theta_{x_n|s_T(x^{n-1})}.
23+
\end{align}
24+
$$
25+
26+
The prior distribution is as follows:
27+
28+
* $g_{0,s} \in [0,1]$ : a hyperparameter assigned to $s \in \mathcal{S}(T)$
29+
* $\beta_0(k|s) \in\mathbb{R}_{>0}$ : a hyperparameter of the Dirichlet distribution
30+
* $\boldsymbol{\beta}_0(s) = (\beta_0(1|s), \beta_0(2|s), \ldots, \beta_0(K|s)) \in\mathbb{R}^{K}_{>0}$
31+
* $C(\boldsymbol{\beta}_0(s)) = \frac{\Gamma\left(\sum_{k=1}^{K} \beta_0(k|s)\right)}{\prod_{k=1}^{K} \Gamma\left(\beta_0(k|s)\right)}$
32+
33+
For $\boldsymbol{\theta}_s$ on $s\in\mathcal{L}(T)$, the Dirichlet distribution is assumed as the prior distribution as follows:
34+
$$
35+
\begin{align}
36+
p(\boldsymbol{\theta}_s|T) = \mathrm{Dir}(\boldsymbol{\theta}_s|\,\boldsymbol{\beta}_0(s)) = C(\boldsymbol{\beta}_0(s)) \prod_{k=1}^{K} \theta_{k|s}^{\beta_0(k|s)-1}.
37+
\end{align}
38+
$$
39+
40+
For $T \in \mathcal{T}$,
41+
$$
42+
\begin{align}
43+
p(T)=\prod_{s \in \mathcal{I}(T)} g_{0,s} \prod_{s' \in \mathcal{L}(T)} (1-g_{0,s'}),
44+
\end{align}
45+
$$
46+
where $g_{0,s}=0$ if the depth of $s$ is $D_\mathrm{max}$.
47+
48+
The posterior distribution is as follows:
49+
50+
* $g_{n,s} \in [0,1]$ : the updated hyperparameter
51+
* $T_\mathrm{max}$ : a superposed context tree, $K$-ary perfect tree whose depth is $D_\mathrm{max}$
52+
* $s_\lambda$ : the root node
53+
* $\beta_n(k|s) \in\mathbb{R}_{>0}$ : a hyperparameter of the posterior Dirichlet distribution
54+
* $\boldsymbol{\beta}_n(s) = (\beta_n(1|s), \beta_n(2|s), \ldots, \beta_n(K|s)) \in\mathbb{R}^{K}_{>0}$
55+
* $I \{ \cdot \}$: the indicator function
56+
57+
For $\boldsymbol{\theta}_s \in\mathcal{L}(T_\mathrm{max})$,
58+
59+
$$
60+
\begin{align}
61+
p(\boldsymbol{\theta}_s|x^n) = \mathrm{Dir}(\boldsymbol{\theta}_s|\,\boldsymbol{\beta}_n(s)) = C(\boldsymbol{\beta}_n(s)) \prod_{k=1}^{K} \theta_{k|s}^{\beta_n(k|s)-1},
62+
\end{align}
63+
$$
64+
65+
where the updating rule of the hyperparameter is as follows:
66+
67+
$$
68+
\begin{align}
69+
\beta_n(k|s) = \beta_0(k|s) + \sum_{i=1}^n I \left\{ \text{$s$ is the ancestor of $s_{T_\mathrm{max}}(x^i)$} \right\}.
70+
\end{align}
71+
$$
72+
73+
For $T \in \mathcal{T}$,
74+
75+
$$p(T|x^{n-1})=\prod_{s \in \mathcal{I}(T)} g_{n,s} \prod_{s' \in \mathcal{L}(T)} (1-g_{n,s'}),$$
76+
77+
where the updating rules of the hyperparameter are as follows:
78+
79+
$$g_{n,s} =
80+
\begin{cases}
81+
g_{0,s} & \text{if $n=0$}, \\
82+
\frac{ g_{n-1,s} \tilde{q}_{s_{\mathrm{child}}} (x_n|x^{n-1}) }
83+
{ \tilde{q}_s(x_n|x^{n-1}) } & \text{otherwise},
84+
\end{cases}$$
85+
where $s_{\mathrm{child}}$ is the child node of $s$ on the path from $s_\lambda$ to $s_{T_\mathrm{max}}(x^n)$ and
86+
87+
$$
88+
\begin{align}
89+
\tilde{q}_s(x_n|x^{n-1}) =
90+
\begin{cases}
91+
q_s(x_n|x^{n-1}) & \text{if $s\in\mathcal{L}(T_\mathrm{max})$}, \\
92+
(1-g_{n-1,s}) q_s(x_n|x^{n-1}) + g_{n-1,s} \tilde{q}_{s_{\mathrm{child}}}(x_n|x^{n-1}) & \text{otherwise}.
93+
\end{cases}
94+
\end{align}
95+
$$
96+
97+
Here,
98+
99+
$$
100+
\begin{align}
101+
q_s(x_n|x^{n-1}) = \frac{ \beta_{n-1}(x_n|s) }
102+
{\sum_{k'=1}^{K} \beta_{n-1}(k'|s)}.
103+
\end{align}
104+
$$
105+
106+
The predictive distribution is as follows:
107+
108+
$$
109+
\begin{align}
110+
p(x_n|x^{n-1}) = \tilde{q}_{s_\lambda}(x_n|x^{n-1}).
111+
\end{align}
112+
$$
113+
114+
References
115+
116+
* T. Matsushima and S. Hirasawa, "A Class of Prior Distributions on Context Tree Models and an Efficient Algorithm of the Bayes Codes Assuming It,"
117+
*2007 IEEE International Symposium on Signal Processing and Information Technology*,
118+
2007, pp. 938-941, doi: 10.1109/ISSPIT.2007.4458049.

0 commit comments

Comments
 (0)