where is transformer decoder #2866
Replies: 2 comments 4 replies
-
This might be a bit misleading, but while the model uses the The Decoder-only models used for text generation have no separate encoder as they only rely on previous tokens. The text is generated autoregressively by predicting one token at a time and uses masked self-attention to restrict access to past tokens (causal). While this is called decoder-only, the transformer block used does not have a cross-attention like the original transformer encoder block since there is no encoder output to attend to. So in this case, the There also exists encoder-only models like BERT, typically used for full context task, which do not have autoregressive masking. Hopefully this helps 😅 |
Beta Was this translation helpful? Give feedback.
-
I think I finally understand what you said.
and
So If I want use bidirectional self-attention in |
Beta Was this translation helpful? Give feedback.
-
Hi,
Only transformer encoder was used in burn's example (https://github.com/tracel-ai/burn/tree/main/examples/text-generation), don't we need decoder here ? If so, when and where to use the decoder ?
Beta Was this translation helpful? Give feedback.
All reactions