Skip to content

A full-stack ChatGPT-like application built (almost) from scratch

Notifications You must be signed in to change notification settings

snnclsr/chatgpt-from-scratch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🌟 Overview

A full-stack ChatGPT-like application built (almost) from scratch, featuring real-time conversation capabilities, multi-modal support, and a modern web interface. This project demonstrates the implementation of various components of a production-ready LLM application, from model training to deployment.

✨ Features

  • 🤖 Multiple LLM Architecture Support
    • MyGPT (Instruction-tuned GPT2, details below)
    • Gemma-3-1b-it
    • Qwen2.5-0.5B-Instruct
    • SmolVLM-256M-Instruct (Multi-modal)
  • 💬 Real-time Conversation
    • WebSocket-based streaming responses
    • Token-by-token generation
  • 🎨 Modern Web Interface (vibe coded)
    • React + TypeScript
    • Tailwind CSS for styling
  • 🖼️ Multi-modal Capabilities
    • Image upload and processing
    • Vision-language model integration
  • 💾 Persistent Storage
    • SQLite database
    • Message and conversation history
  • 🐳 Containerization
    • Docker support for both frontend and backend
    • Easy deployment and scaling

🛠️ Technical Stack

Frontend

  • React 18
  • TypeScript
  • Tailwind CSS
  • WebSocket for real-time communication

Backend

  • FastAPI
  • SQLAlchemy with SQLite
  • PyTorch
  • Transformers
  • WebSockets
  • Docker

🚀 Getting Started

Installation

  1. Clone the repository
git clone https://github.com/snnclsr/chatgpt-from-scratch.git
cd chatgpt-from-scratch
  1. With docker-compose
cd chatgpt-from-scratch
docker-compose up --build
  1. or separately without the docker

Backend:

uvicorn backend.main:app --host 0.0.0.0 --port 8000 --reload

Frontend:

cd frontend
npm start

📚 Model Training

The training procedure is under modelling directory building upon Sebastian Raschka's "Build a Large Language Model (From Scratch)" book, adapted from here and implementing:

  • GPT2 model architecture
  • Instruction tuning using the Alpaca dataset (from the pretrained weights)

To run the training:

python -m modelling.train

I also applied following changes to the training code/model to make training/inference faster (https://github.com/rasbt/LLMs-from-scratch/tree/main/ch05/10_llm-training-speed)

  1. Create causal mask on the fly
  2. Use tensor cores
  3. Fused AdamW optimizer
  4. Replacing from-scratch code by PyTorch classes
  5. Using FlashAttention
  6. Using pytorch.compile
  7. Increasing the batch size

As outlined by Sebastian as well, these updates make everything go faster, 6-7 times.

🙏 Acknowledgments

Future Enhancements

  • Markdown support

About

A full-stack ChatGPT-like application built (almost) from scratch

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published