A minimal implementation of a Transformer-based Language Model designed for learning and experimentation.
This project provides a simple implementation of the Transformer language model (Causal LM) with the goals of:
- Understanding the core mechanisms of the Transformer architecture.
- Serving as a basis for rapid experimentation and modification.
The model implements essential components of a Transformer decoder-based language model, including:
- Token embedding
- Rotary embeddings
- Self-attention
- Feed-forward
- RMS Normalization
This project is licensed under the GPL v3 License - see the LICENSE file for details.
Note: The tokenizer used in this project is based on GPT-2 and is licensed under the MIT License. Please refer to the tokenizer/LICENSE file for the tokenizer’s license information.