Welcome to the playAttention ! To see the full documentation, please visit the Wiki.
Hi! This is a playground for understanding Attention and Transformers. This repository is my way of getting hands-on experience by building my first Transformer-based language models: GPT and GPT2. I hope it will be helpful to others who want to explore this fascinating field in the future.
Disclaimer: This is an ongoing project—constantly evolving, growing, and being reviewed. As such, there may be mistakes, incomplete sections, or incorrect assumptions. Feedback and corrections are always welcome!
This is a list of videos, tutorials, and posts that have helped me throughout my learning journey. I recommend taking your time to go through them—they're worth a careful look.
- Let's build GPT: from scratch, in code, spelled out [video]
- Let's reproduce GPT-2 (124M)
- build-nanogpt
- nanGPT
Other
- Deep Learning.AI - How tranformer LLMs work [course]
- Jalammar - Illustrated Transformer [post]
- Borealis - Tutorial #14: Transformers I: Introduction [post]
- Borealis - Tutorial #16: Transformers II: Extensions [post] * review after training with nanoGPT
- Borealis - Tutorial #17: Transformers III Training [post] * review after training with nanoGPT
The code repository includes the implementation of both GPT and GPT-2 models, as well as the training scripts. The code is organized into several folders:
-
models/
: scritpsmodel_GPT.py
andmodel_GPT2.py
contain the full architecture of the DIY-GPT models. It was built by following Karpathy's tutorials step by step, though you'll notice some differences in variable names, comments, refactoring, etc. I adapted it to what felt most intuitive for me—feel free to modify or build your own version as well. -
train/
: scritpstrain_GPT.py
andtrain_GPT2.py
load the configuration and the GPT models and launches the training loop. After training, an example of text generation will be executed, and a log files detailing the training process will be saved in theresults/
folder. For example, you can find the train / val loss plot that is generated during training:
Config.py
: Defines the data model for the GPT models configuration, including hyperparameters and design choices related to the architecture. This configuration is necessary for loading and training the model.
Disclaimer: This is an ongoing project—constantly evolving, growing, and being reviewed. As such, there may be mistakes, incomplete sections, or incorrect assumptions. Feedback and corrections are always welcome!
Environment Setup
python -m venv venv
source venv/bin/activate && pip install -r requirements.txt
Training GPT / GPT2 with DDP (supports single process):
torchrun --standalone --nproc_per_node=1 train/train_GPT2.py
torchrun --standalone --nproc_per_node=1 train/train_GPT.py
Training GPT + GPT2 (sequentially):
make