GitHub - snnclsr/chatgpt-from-scratch: A full-stack ChatGPT-like application built (almost) from scratch

🌟 Overview

A full-stack ChatGPT-like application built (almost) from scratch, featuring real-time conversation capabilities, multi-modal support, and a modern web interface. This project demonstrates the implementation of various components of a production-ready LLM application, from model training to deployment.

✨ Features

🤖 Multiple LLM Architecture Support
- MyGPT (Instruction-tuned GPT2, details below)
- Gemma-3-1b-it
- Qwen2.5-0.5B-Instruct
- SmolVLM-256M-Instruct (Multi-modal)
💬 Real-time Conversation
- WebSocket-based streaming responses
- Token-by-token generation
🎨 Modern Web Interface (vibe coded)
- React + TypeScript
- Tailwind CSS for styling
🖼️ Multi-modal Capabilities
- Image upload and processing
- Vision-language model integration
💾 Persistent Storage
- SQLite database
- Message and conversation history
🐳 Containerization
- Docker support for both frontend and backend
- Easy deployment and scaling

🛠️ Technical Stack

Frontend

React 18
TypeScript
Tailwind CSS
WebSocket for real-time communication

Backend

FastAPI
SQLAlchemy with SQLite
PyTorch
Transformers
WebSockets
Docker

🚀 Getting Started

Installation

Clone the repository

git clone https://github.com/snnclsr/chatgpt-from-scratch.git
cd chatgpt-from-scratch

With docker-compose

cd chatgpt-from-scratch
docker-compose up --build

or separately without the docker

Backend:

uvicorn backend.main:app --host 0.0.0.0 --port 8000 --reload

Frontend:

cd frontend
npm start

📚 Model Training

The training procedure is under modelling directory building upon Sebastian Raschka's "Build a Large Language Model (From Scratch)" book, adapted from here and implementing:

GPT2 model architecture
Instruction tuning using the Alpaca dataset (from the pretrained weights)

To run the training:

python -m modelling.train

I also applied following changes to the training code/model to make training/inference faster (https://github.com/rasbt/LLMs-from-scratch/tree/main/ch05/10_llm-training-speed)

Create causal mask on the fly
Use tensor cores
Fused AdamW optimizer
Replacing from-scratch code by PyTorch classes
Using FlashAttention
Using pytorch.compile
Increasing the batch size

As outlined by Sebastian as well, these updates make everything go faster, 6-7 times.

🙏 Acknowledgments

Sebastian Raschka's "Build a Large Language Model (From Scratch)" book
Alpaca dataset (CC BY-NC 4.0)
Open-source model providers (Gemma-3-1b-it, Qwen2.5-0.5B-Instruct, SmolVLM-256M-Instruct)

Future Enhancements

Markdown support

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
assets		assets
backend		backend
frontend		frontend
modelling		modelling
.gitignore		.gitignore
README.md		README.md
alpaca_data.json		alpaca_data.json
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🌟 Overview

✨ Features

🛠️ Technical Stack

Frontend

Backend

🚀 Getting Started

Installation

📚 Model Training

🙏 Acknowledgments

Future Enhancements

About

Uh oh!

Releases

Packages

Languages

snnclsr/chatgpt-from-scratch

Folders and files

Latest commit

History

Repository files navigation

🌟 Overview

✨ Features

🛠️ Technical Stack

Frontend

Backend

🚀 Getting Started

Installation

📚 Model Training

🙏 Acknowledgments

Future Enhancements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages