Skip to content

Hands-on reconstruction of GPT/GPT-2 — architecture, tokenizer, and training loop built from the ground up.

Notifications You must be signed in to change notification settings

iriacardiel/playAttention

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Welcome to the playAttention ! To see the full documentation, please visit the Wiki.

What is playAttention?

Hi! This is a playground for understanding Attention and Transformers. This repository is my way of getting hands-on experience by building my first Transformer-based language models: GPT and GPT2. I hope it will be helpful to others who want to explore this fascinating field in the future.

Disclaimer: This is an ongoing project—constantly evolving, growing, and being reviewed. As such, there may be mistakes, incomplete sections, or incorrect assumptions. Feedback and corrections are always welcome!

Resources

This is a list of videos, tutorials, and posts that have helped me throughout my learning journey. I recommend taking your time to go through them—they're worth a careful look.

Alfredo Canziani

Andrej Karpathy

Other

Transformer Architecture (Decoder Only)

The code repository

The code repository includes the implementation of both GPT and GPT-2 models, as well as the training scripts. The code is organized into several folders:

  • models/: scritps model_GPT.py and model_GPT2.py contain the full architecture of the DIY-GPT models. It was built by following Karpathy's tutorials step by step, though you'll notice some differences in variable names, comments, refactoring, etc. I adapted it to what felt most intuitive for me—feel free to modify or build your own version as well.

  • train/: scritps train_GPT.py and train_GPT2.py load the configuration and the GPT models and launches the training loop. After training, an example of text generation will be executed, and a log files detailing the training process will be saved in the results/ folder. For example, you can find the train / val loss plot that is generated during training:

  • Config.py: Defines the data model for the GPT models configuration, including hyperparameters and design choices related to the architecture. This configuration is necessary for loading and training the model.

Disclaimer: This is an ongoing project—constantly evolving, growing, and being reviewed. As such, there may be mistakes, incomplete sections, or incorrect assumptions. Feedback and corrections are always welcome!

Environment Setup

python -m venv venv
source venv/bin/activate && pip install -r requirements.txt

Training GPT / GPT2 with DDP (supports single process):

torchrun --standalone --nproc_per_node=1 train/train_GPT2.py
torchrun --standalone --nproc_per_node=1 train/train_GPT.py

Training GPT + GPT2 (sequentially):

make

alt text

About

Hands-on reconstruction of GPT/GPT-2 — architecture, tokenizer, and training loop built from the ground up.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published