Skip to content

TST parameter fixup #897

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 4 additions & 5 deletions nbs/049_models.TST.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -71,8 +71,8 @@
"* max_seq_len: useful to control the temporal resolution in long time series to avoid memory issues. Default. None.\n",
"* d_model: total dimension of the model (number of features created by the model). Usual values: 128-1024. Default: 128.\n",
"* n_heads: parallel attention heads. Usual values: 8-16. Default: 16.\n",
"* d_k: size of the learned linear projection of queries and keys in the MHA. Usual values: 16-512. Default: None -> (d_model/n_heads) = 32.\n",
"* d_v: size of the learned linear projection of values in the MHA. Usual values: 16-512. Default: None -> (d_model/n_heads) = 32.\n",
"* d_k: size of the learned linear projection of queries and keys in the MHA. Usual values: 8-64. Default: None -> (d_model/n_heads) = 8.\n",
"* d_v: size of the learned linear projection of values in the MHA. Usual values: 8-64. Default: None -> (d_model/n_heads) = 8.\n",
"* d_ff: the dimension of the feedforward network model. Usual values: 256-4096. Default: 256.\n",
"* dropout: amount of residual dropout applied in the encoder. Usual values: 0.-0.3. Default: 0.1.\n",
"* activation: the activation function of intermediate layer, relu or gelu. Default: 'gelu'.\n",
Expand Down Expand Up @@ -218,7 +218,6 @@
" def __init__(self, q_len:int, d_model:int, n_heads:int, d_k:Optional[int]=None, d_v:Optional[int]=None, d_ff:int=256, dropout:float=0.1, \n",
" activation:str=\"gelu\"):\n",
"\n",
" assert d_model // n_heads, f\"d_model ({d_model}) must be divisible by n_heads ({n_heads})\"\n",
" d_k = ifnone(d_k, d_model // n_heads)\n",
" d_v = ifnone(d_v, d_model // n_heads)\n",
"\n",
Expand Down Expand Up @@ -320,8 +319,8 @@
" max_seq_len: useful to control the temporal resolution in long time series to avoid memory issues.\n",
" d_model: total dimension of the model (number of features created by the model)\n",
" n_heads: parallel attention heads.\n",
" d_k: size of the learned linear projection of queries and keys in the MHA. Usual values: 16-512. Default: None -> (d_model/n_heads) = 32.\n",
" d_v: size of the learned linear projection of values in the MHA. Usual values: 16-512. Default: None -> (d_model/n_heads) = 32.\n",
" d_k: size of the learned linear projection of queries and keys in the MHA. Usual values: 8-64. Default: None -> (d_model/n_heads) = 8.\n",
" d_v: size of the learned linear projection of values in the MHA. Usual values: 8-64. Default: None -> (d_model/n_heads) = 8.\n",
" d_ff: the dimension of the feedforward network model.\n",
" dropout: amount of residual dropout applied in the encoder.\n",
" act: the activation function of intermediate layer, relu or gelu.\n",
Expand Down
5 changes: 2 additions & 3 deletions tsai/models/TST.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,6 @@ class _TSTEncoderLayer(Module):
def __init__(self, q_len:int, d_model:int, n_heads:int, d_k:Optional[int]=None, d_v:Optional[int]=None, d_ff:int=256, dropout:float=0.1,
activation:str="gelu"):

assert d_model // n_heads, f"d_model ({d_model}) must be divisible by n_heads ({n_heads})"
d_k = ifnone(d_k, d_model // n_heads)
d_v = ifnone(d_v, d_model // n_heads)

Expand Down Expand Up @@ -142,8 +141,8 @@ def __init__(self, c_in:int, c_out:int, seq_len:int, max_seq_len:Optional[int]=N
max_seq_len: useful to control the temporal resolution in long time series to avoid memory issues.
d_model: total dimension of the model (number of features created by the model)
n_heads: parallel attention heads.
d_k: size of the learned linear projection of queries and keys in the MHA. Usual values: 16-512. Default: None -> (d_model/n_heads) = 32.
d_v: size of the learned linear projection of values in the MHA. Usual values: 16-512. Default: None -> (d_model/n_heads) = 32.
d_k: size of the learned linear projection of queries and keys in the MHA. Usual values: 8-64. Default: None -> (d_model/n_heads) = 8.
d_v: size of the learned linear projection of values in the MHA. Usual values: 8-64. Default: None -> (d_model/n_heads) = 8.
d_ff: the dimension of the feedforward network model.
dropout: amount of residual dropout applied in the encoder.
act: the activation function of intermediate layer, relu or gelu.
Expand Down