Skip to content

๐Ÿงธ A fully custom GPT-style language model built from scratch using PyTorch and trained on Winnie-the-Pooh! Explored the core mechanics of self-attention, autoregressive text generation, and modular model training, all without relying on any external libraries.

License

Notifications You must be signed in to change notification settings

Akhan521/GPT-From-Scratch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

40 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

GPT-From-Scratch: Building My Own GPT-Like Model

๐Ÿš€ Project Overview

GPT-From-Scratch is my custom implementation of a Decoder-Only Transformer (GPT) architecture, trained end-to-end on a classic childhood story: Winnie-the-Pooh ๐Ÿงธ๐Ÿฏ. Built entirely from scratch using PyTorch, my project walks through the full lifecycle of modern generative language models: from building attention mechanisms manually to generating character-level text.

What I Offer?

  • โœ๏ธ Custom-built transformer architecture and self-attention mechanisms (no shortcuts or pretrained libraries)
  • โš™๏ธ Clean separation of logic via a modular Trainer class and reusable generation pipeline
  • ๐Ÿง  Tokenized at the character-level, enabling creative and flexible text generation

๐Ÿ”— Try it out in Google Colab - generate novel text with your own prompts and settings.

Whether you're technical, non-technical, or just someone nostalgic about storybooks, this repo aims to teach, inspire, and demonstrate the power of transformer models in an intuitive and accessible way.

โœจ Motivation

As a kid, Winnie-the-Pooh wasn't just a story to me, it was my childhood and a world of stories. Recreating that world using artificial intelligence felt like the perfect blend of nostalgia and innovation.

This project began as a challenge to build my own GPT (Generative Pretrained Transformer) entirely from scratch. I wanted to understand the mechanics of self-attention, positional embeddings, and autoregressive generation by implementing every component manually.

Beyond technical curiosity, this was also about having fun with machine learning. By training on the first 6 chapters of Winnie-the-Pooh, my model captures the whimsical, rhythmic tone of A. A. Milneโ€™s writing.

๐Ÿงธ Building something personal made the entire learning process more joyful and reinforced that machine learning can be both rigorous and fun.

๐Ÿ›  Features

This project was built completely from scratch using PyTorch. It is a minimal, educational implementation of a GPT-like model. Below are some of the key features and components:

โœ… Core Functionality

  • Character-level Tokenization
    Each character is treated as a token, allowing for greater flexibility and creativity in generation.

  • Custom GPT Architecture
    Built entirely from scratch, including:

    • Multi-head self-attention
    • Feedforward layers
    • Residual connections and layer normalization
  • Manual Training Loop (GPTTrainer Class)
    Includes a reusable and modular training loop with support for:

    • Checkpointing
    • Custom batch sizes
    • Optional validation split
    • Plotting of training/validation loss
  • Text Generation Pipeline
    Supports prompt-based generation with:

    • Adjustable max_length
    • Configurable temperature for sampling randomness
    • Autoregressive one-token-at-a-time prediction

๐Ÿ“ฆ Configurable Training

  • Fully customizable via a TrainingConfig class:
    • Batch size, learning rate, context length, model dimensions
    • CPU/GPU support
    • Save paths for checkpoints, vocab, and final model

๐Ÿ“ˆ Visualization

  • Automatic loss tracking and plot generation after training
  • Clean training summaries saved to files

๐ŸŒ Colab-Ready Deployment

  • Interactive Colab notebook included
  • Users can load the trained model and generate new text live with adjustable settings

This makes the project both a learning tool and a demo platform.

๐Ÿง  Model Architecture

This GPT-like model was designed to be lightweight enough to train on my laptop CPU, while still demonstrating the core principles of transformer-based language models.

๐Ÿงฉ Key Design Choices

Component Description
Tokenization Character-level (1 token = 1 character)
Context Length 64 tokens
Embedding Size 128 dimensions
Transformer Blocks 4 (each with self-attention + MLP)
Attention Heads 4 (multi-head self-attention)
Training Epochs 60
Device CPU (no GPU used)

๐Ÿ”ง Modules Built From Scratch

  • Multi-Head Self-Attention: Includes custom masking and projection
  • Feedforward Block: Feedforward Neural Network + ReLU activation
  • Residual Connections + LayerNorm: Stabilize training across blocks
  • Autoregressive Output Head: Predicts next character based on context

๐Ÿ‹๏ธโ€โ™‚๏ธ Training Details

This project was trained entirely on my CPU, using only the first 6 chapters of Winnie-the-Pooh. The aim was not only to build a working GPT model from scratch but to generate rich, stylistically consistent text from a story thatโ€™s warm, whimsical, and character-driven.

Note: My model was trained using all available data (no validation split), and checkpoints were saved periodically to allow recovery in case of interruption.

๐Ÿ“Š You can view the final loss curve and training summary here:
๐Ÿ‘‰ training_outputs/

โœจ Results & Sample Outputs

After training for 60 epochs on just the first six chapters of Winnie-the-Pooh, my model learned to generate consistent and surprisingly coherent text, even on a CPU and limited dataset.

Here are a few sample generations from my trained model:

๐Ÿป Prompt: "Here is Edward Bear, coming"
๐Ÿ“ Output: "Here is Edward Bear, coming downstairs now, bump, bump, on the back of his head, behind Christopher Robin. It is, as far as he..."

๐Ÿป Prompt: "Pooh said"
๐Ÿ“ Output: "Pooh said, "Yes, but it isn't quite a full jar," and he threw it down to Piglet, and Piglet said, "No, it isn't..."

๐Ÿป Prompt: "Christopher Robin"
๐Ÿ“ Output: "Christopher Robin and Pooh went home to breakfast together. "Oh, Bear!" said Christopher Robin. "How I do love you!"

These examples demonstrate:

  • My model is able to capture the tone, rhythm, and whimsical charm of the original story.
  • It handles character dialogue and scene-setting with surprising ability for such a small model.
  • Even custom prompts result in creative continuations that feel true to the world of the story.

๐Ÿš€ Try It Yourself

There are two ways to play around with my trained GPT model:

Recommended for non-technical users or anyone who wants to quickly test the model without setup.

  • No installation required
  • Run everything from your browser
  • Modify prompts, generation length, and temperature

๐Ÿ› ๏ธ 2. Run Locally

For more details, click here.

๐Ÿงฑ Project Structure

The project is cleanly organized to keep the codebase readable, extensible, and easy to navigate. Here's how I structure everything:

GPT-FROM-SCRATCH/
โ”œโ”€โ”€ data/
โ”‚   โ””โ”€โ”€ training_text.txt            # Raw training text (first 6 chapters of Winnie the Pooh)
โ”œโ”€โ”€ model/
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ””โ”€โ”€ gpt.py                       # Custom GPT architecture (self-attention, MLP, etc.)
โ”œโ”€โ”€ notebooks/
โ”‚   โ””โ”€โ”€ GPT_From_Scratch_Text_Generation.ipynb  # Interactive Colab demo for generation
โ”œโ”€โ”€ training/
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ config.py                    # Centralized config for training (context, lr, etc.)
โ”‚   โ”œโ”€โ”€ dataset.py                   # PyTorch Dataset with char-level tokenization
โ”‚   โ””โ”€โ”€ trainer.py                   # GPTTrainer class handling training loop and plots
โ”œโ”€โ”€ training_outputs/
โ”‚   โ”œโ”€โ”€ Training_Progress.png        # Plot of training + validation loss
โ”‚   โ””โ”€โ”€ training_summary.txt         # Summary of training run and hyperparameters
โ”œโ”€โ”€ utils/
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ””โ”€โ”€ text_generation.py           # Logic for text generation
โ”œโ”€โ”€ vocab/
โ”‚   โ””โ”€โ”€ vocab.pkl                    # Serialized vocab (char-to-int, int-to-char)
โ”œโ”€โ”€ weights/                         # Final trained model + periodic checkpoints
โ”œโ”€โ”€ .gitignore
โ”œโ”€โ”€ generate.py                      # CLI for generating text with trained model
โ”œโ”€โ”€ LICENSE
โ””โ”€โ”€ train.py                         # Main script to train the model from scratch

๐Ÿ“ฆ Installation & Setup

To get started with this project locally, follow the steps below.

๐Ÿ”ง Requirements

This project was developed using Python 3.12 and the following core libraries:

  • PyTorch (2.7.1)
  • matplotlib
  • numpy
  • and others (see requirements.txt)

โœ… Although GPU support is available, no GPU is required.

๐Ÿ› ๏ธ Installation

Clone the repository and install dependencies:

git clone https://github.com/Akhan521/GPT-From-Scratch.git
cd GPT-From-Scratch
pip install -r requirements.txt

๐Ÿ“ Make sure your folder structure looks like this:

  • data/: contains the training text.
  • model/: GPT model implementation.
  • training/: dataset, trainer, config.
  • utils/: text generation logic.
  • training_outputs/: training plots + summary.
  • weights/: saved model checkpoint.
  • notebooks/: interactive Colab demo.

๐Ÿงช Quick Test (Text Generation)

To test the trained model locally after installing:

python generate.py

This will load the trained model from weights/final_model.pt and generate text using the stored vocabulary.

๐Ÿง  Reflections & Future Work

Building this GPT model from scratch was both a technical deep dive and a nostalgic journey. It pushed me to internalize the inner workings of self-attention, transformer blocks, and the training loop, not just at a conceptual level, but at the implementation level.

๐Ÿ’ก What I Learned

  • How to implement a Decoder-Only Transformer without relying on pre-built transformer libraries.
  • The importance of proper tokenization, training checkpoints, and text generation loops.
  • How temperature and context length influence the creativity and coherence of generated text.
  • How to structure a clean and modular ML codebase using model, training, and utils modules for scalability.
  • The benefits and trade-offs of character-level tokenization, especially for small datasets and creative generation tasks.
  • How to track training progress through visualizations and summaries + interpret model loss curves.

๐Ÿ”ฎ What's Next?

This project is just the beginning. Some next steps I'd love to explore:

  • Scaling up: Training on larger datasets (e.g. full books or datasets).
  • Sampling strategies: Implement top-k sampling for better generation quality.
  • Interactive UI: Build a simple web or terminal-based app for live prompt input and generation.
  • Model Evaluation: Quantify performance using validation.

If you've made it this far into my README, thank you for reading and giving me your time!
I'm always open to feedback, collaboration, and discussion.

๐Ÿ”— Connect with me: GitHub + LinkedIn

โœจ View my portfolio: Portfolio

โญ If you liked this project, consider starring the repo!

About

๐Ÿงธ A fully custom GPT-style language model built from scratch using PyTorch and trained on Winnie-the-Pooh! Explored the core mechanics of self-attention, autoregressive text generation, and modular model training, all without relying on any external libraries.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published