|
| 1 | +<!-- |
| 2 | +Document Author |
| 3 | +Koshi Shimada <shimada.koshi.re@gmail.com> |
| 4 | +--> |
| 5 | + |
| 6 | +The stochastic data generative model is as follows: |
| 7 | + |
| 8 | +* $\mathcal{X}=\{1,2,\ldots,K\}$ : a space of a source symbol |
| 9 | +* $x^n = x_1 x_2 \cdots x_n \in \mathcal{X}^n~(n\in\mathbb{N})$ : an source sequence |
| 10 | +* $D_\mathrm{max} \in \mathbb{N}$ : the maximum depth of context tree models |
| 11 | +* $T$ : a context tree model, $K$-ary regular tree whose depth is smaller than or equal to $D_\mathrm{max}$, where "regular" means that all inner nodes have $K$ child nodes |
| 12 | +* $\mathcal{T}$ : a set of $T$ |
| 13 | +* $s$ : a node of a context tree model |
| 14 | +* $\mathcal{I}(T)$ : a set of inner nodes of $T$ |
| 15 | +* $\mathcal{L}(T)$ : a set of leaf nodes of $T$ |
| 16 | +* $\mathcal{S}(T)$ : a set of all nodes of $T$, i.e., $\mathcal{S}(T) = \mathcal{I}(T) \cup \mathcal{L}(T)$ |
| 17 | +* $s_T(x^{n-1}) \in \mathcal{L}(T)$ : a leaf node of $T$ corresponding to $x^{n-1} = x_1 x_2\cdots x_{n-1}$ |
| 18 | +* $\boldsymbol{\theta}_s = (\theta_{1|s}, \theta_{2|s}, \ldots, \theta_{K|s})$ : a parameter on a leaf node, where $\theta_{k|s}$ denotes the occurrence probability of $k\in\mathcal{X}$ |
| 19 | + |
| 20 | +$$ |
| 21 | +\begin{align} |
| 22 | + p(x_n | x^{n-1}, \boldsymbol{\theta}_T, T)=\theta_{x_n|s_T(x^{n-1})}. |
| 23 | +\end{align} |
| 24 | +$$ |
| 25 | + |
| 26 | +The prior distribution is as follows: |
| 27 | + |
| 28 | +* $g_{0,s} \in [0,1]$ : a hyperparameter assigned to $s \in \mathcal{S}(T)$ |
| 29 | +* $\beta_0(k|s) \in\mathbb{R}_{>0}$ : a hyperparameter of the Dirichlet distribution |
| 30 | +* $\boldsymbol{\beta}_0(s) = (\beta_0(1|s), \beta_0(2|s), \ldots, \beta_0(K|s)) \in\mathbb{R}^{K}_{>0}$ |
| 31 | +* $C(\boldsymbol{\beta}_0(s)) = \frac{\Gamma\left(\sum_{k=1}^{K} \beta_0(k|s)\right)}{\prod_{k=1}^{K} \Gamma\left(\beta_0(k|s)\right)}$ |
| 32 | + |
| 33 | +For $\boldsymbol{\theta}_s$ on $s\in\mathcal{L}(T)$, the Dirichlet distribution is assumed as the prior distribution as follows: |
| 34 | +$$ |
| 35 | +\begin{align} |
| 36 | + p(\boldsymbol{\theta}_s|T) = \mathrm{Dir}(\boldsymbol{\theta}_s|\,\boldsymbol{\beta}_0(s)) = C(\boldsymbol{\beta}_0(s)) \prod_{k=1}^{K} \theta_{k|s}^{\beta_0(k|s)-1}. |
| 37 | +\end{align} |
| 38 | +$$ |
| 39 | + |
| 40 | +For $T \in \mathcal{T}$, |
| 41 | +$$ |
| 42 | +\begin{align} |
| 43 | + p(T)=\prod_{s \in \mathcal{I}(T)} g_{0,s} \prod_{s' \in \mathcal{L}(T)} (1-g_{0,s'}), |
| 44 | +\end{align} |
| 45 | +$$ |
| 46 | +where $g_{0,s}=0$ if the depth of $s$ is $D_\mathrm{max}$. |
| 47 | + |
| 48 | +The posterior distribution is as follows: |
| 49 | + |
| 50 | +* $g_{n,s} \in [0,1]$ : the updated hyperparameter |
| 51 | +* $T_\mathrm{max}$ : a superposed context tree, $K$-ary perfect tree whose depth is $D_\mathrm{max}$ |
| 52 | +* $s_\lambda$ : the root node |
| 53 | +* $\beta_n(k|s) \in\mathbb{R}_{>0}$ : a hyperparameter of the posterior Dirichlet distribution |
| 54 | +* $\boldsymbol{\beta}_n(s) = (\beta_n(1|s), \beta_n(2|s), \ldots, \beta_n(K|s)) \in\mathbb{R}^{K}_{>0}$ |
| 55 | +* $I \{ \cdot \}$: the indicator function |
| 56 | + |
| 57 | +For $\boldsymbol{\theta}_s \in\mathcal{L}(T_\mathrm{max})$, |
| 58 | + |
| 59 | +$$ |
| 60 | +\begin{align} |
| 61 | + p(\boldsymbol{\theta}_s|x^n) = \mathrm{Dir}(\boldsymbol{\theta}_s|\,\boldsymbol{\beta}_n(s)) = C(\boldsymbol{\beta}_n(s)) \prod_{k=1}^{K} \theta_{k|s}^{\beta_n(k|s)-1}, |
| 62 | +\end{align} |
| 63 | +$$ |
| 64 | + |
| 65 | +where the updating rule of the hyperparameter is as follows: |
| 66 | + |
| 67 | +$$ |
| 68 | +\begin{align} |
| 69 | + \beta_n(k|s) = \beta_0(k|s) + \sum_{i=1}^n I \left\{ \text{$s$ is the ancestor of $s_{T_\mathrm{max}}(x^i)$} \right\}. |
| 70 | +\end{align} |
| 71 | +$$ |
| 72 | + |
| 73 | +For $T \in \mathcal{T}$, |
| 74 | + |
| 75 | +$$p(T|x^{n-1})=\prod_{s \in \mathcal{I}(T)} g_{n,s} \prod_{s' \in \mathcal{L}(T)} (1-g_{n,s'}),$$ |
| 76 | + |
| 77 | +where the updating rules of the hyperparameter are as follows: |
| 78 | + |
| 79 | +$$g_{n,s} = |
| 80 | +\begin{cases} |
| 81 | + g_{0,s} & \text{if $n=0$}, \\ |
| 82 | + \frac{ g_{n-1,s} \tilde{q}_{s_{\mathrm{child}}} (x_n|x^{n-1}) } |
| 83 | + { \tilde{q}_s(x_n|x^{n-1}) } & \text{otherwise}, |
| 84 | +\end{cases}$$ |
| 85 | +where $s_{\mathrm{child}}$ is the child node of $s$ on the path from $s_\lambda$ to $s_{T_\mathrm{max}}(x^n)$ and |
| 86 | + |
| 87 | +$$ |
| 88 | +\begin{align} |
| 89 | + \tilde{q}_s(x_n|x^{n-1}) = |
| 90 | + \begin{cases} |
| 91 | + q_s(x_n|x^{n-1}) & \text{if $s\in\mathcal{L}(T_\mathrm{max})$}, \\ |
| 92 | + (1-g_{n-1,s}) q_s(x_n|x^{n-1}) + g_{n-1,s} \tilde{q}_{s_{\mathrm{child}}}(x_n|x^{n-1}) & \text{otherwise}. |
| 93 | + \end{cases} |
| 94 | +\end{align} |
| 95 | +$$ |
| 96 | + |
| 97 | +Here, |
| 98 | + |
| 99 | +$$ |
| 100 | +\begin{align} |
| 101 | + q_s(x_n|x^{n-1}) = \frac{ \beta_{n-1}(x_n|s) } |
| 102 | + {\sum_{k'=1}^{K} \beta_{n-1}(k'|s)}. |
| 103 | +\end{align} |
| 104 | +$$ |
| 105 | + |
| 106 | +The predictive distribution is as follows: |
| 107 | + |
| 108 | +$$ |
| 109 | +\begin{align} |
| 110 | +p(x_n|x^{n-1}) = \tilde{q}_{s_\lambda}(x_n|x^{n-1}). |
| 111 | +\end{align} |
| 112 | +$$ |
| 113 | + |
| 114 | +References |
| 115 | + |
| 116 | +* T. Matsushima and S. Hirasawa, "A Class of Prior Distributions on Context Tree Models and an Efficient Algorithm of the Bayes Codes Assuming It," |
| 117 | +*2007 IEEE International Symposium on Signal Processing and Information Technology*, |
| 118 | +2007, pp. 938-941, doi: 10.1109/ISSPIT.2007.4458049. |
0 commit comments