A simple TensorFlow/Keras implementation of the Transformer from "Attention is All You Need".
- Fully functional Encoder-Decoder architecture
- Masked self-attention and encoder-decoder attention
- Feed-forward networks, embeddings, and positional encodings
- Unit tests for verifying layer outputs
Work in Progress – this implementation is still under development.