🐉 High Valyrian MiniGPT — Build GPT Language Model From Scratch

📖 Project Description

This project is a full from-scratch implementation of a GPT-like transformer model, trained to generate text in the fictional High Valyrian language (from Game of Thrones universe). It was created as a learning project to deeply understand:

The inner working of Large Language Models (LLMs)
Every important component: tokenizer, dataloader, transformer, training tricks, inference techniques
How GPT models are really built under the hood — no high-level libraries like HuggingFace were used.

✅ Goal: Build everything by hand, and learn by building.

✅ Target Audience: Anyone curious about LLM internals, researchers, students, and engineers.

🛠️ Tools and Libraries Used

Tool/Library	Why we used it
Python	For clean, flexible model prototyping
PyTorch	To manually implement Transformer architecture with full control over layers
Weights & Biases (wandb)	For tracking training metrics, loss curves, debugging models easily
TorchScript	To export the final trained model for optimized production/inference use
Matplotlib (optional)	For visualization (loss curves, attention maps in future extensions)

🏛️ Model Architecture

➡️ Components Implemented:

Byte Pair Encoding (BPE) tokenizer (custom)
Causal Self-Attention layer
Multi-Head Attention with masking
Transformer Block with pre-layernorm and MLP
Positional Embeddings
GPT Decoder Head (Linear layer to predict next token)

🧐 Model Hyperparameters:

Hyperparameter	Value
Embedding dimension	128
Number of heads	4
Number of layers (Transformer blocks)	4
Dropout	0.2
Vocabulary Size	~500 tokens (custom BPE tokenizer)
Context Length (Block Size)	64 tokens

🔥 How This Differs From GPT-2:

Feature	GPT-2 (Original)	High Valyrian MiniGPT (This project)
Model size	117M+ parameters	~1.5M parameters
Tokenizer	Trained on WebText, massive	Trained on small High Valyrian corpus
Layers/Heads	12 layers, 12 heads	4 layers, 4 heads
Dropout	Yes	Yes
Weight initialization	Gaussian Normal	Same (manual)
LayerNorm	Post-Attention	Pre-Attention (better for small models)
Training tricks	AdamW, cosine decay	Adam, weight decay, early stopping
Scaling	Billion-token corpus	Small educational corpus

✅ Architecturally faithful to GPT-2, but scaled down for fast training and clarity.

🌟 Key Features

True causal masked attention (can't see future tokens)
Custom top-p (nucleus) sampling during inference
Full training, evaluation, checkpointing, and wandb logging
Exportable via TorchScript for future deployment
Lightweight and easy to train on a single GPU

📈 WandB Training Logs

You can see the full training progress here:
🔗 Project WandB Dashboard

(includes loss curves, model samples during training, and metrics)

🚀 How to Run the Project

Install dependencies:
```
pip install -r requirements.txt
```
Prepare dataset:
- Add your corpus file to data/High Valyrian.txt.
Train BPETokenizer:
```
python tokenizer/train_tokenizer.py
```
Train model:
```
python train.py
```

Generate text:

python inference/generate.py --prompt "valar morghulis"

Export to TorchScript:
```
python scripts/export_torchscript.py
```

📜 Lessons Learned

Building LLMs is about small careful engineering choices: masking, layer ordering, normalization, sampling.
Training tricks (dropout, optimizer tweaks) make a huge difference.
Inference matters as much as training — generation quality is the true test.
Tooling (like wandb) is essential to debug and understand model behavior.
Scaling up (to bigger GPTs) is mostly about data, compute, and model depth — the core principles stay the same.

✨ Final Thoughts

This project demonstrates not just how to use LLMs, but how to build them — layer by layer, loss by loss, token by token.

If you truly understand something, you can build it yourself.

That is the spirit of this project. 👨‍🔧🔥

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🐉 High Valyrian MiniGPT — Build GPT Language Model From Scratch

📖 Project Description

🛠️ Tools and Libraries Used

🏛️ Model Architecture

➡️ Components Implemented:

🧐 Model Hyperparameters:

🔥 How This Differs From GPT-2:

🌟 Key Features

📈 WandB Training Logs

🚀 How to Run the Project

📜 Lessons Learned

✨ Final Thoughts

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
inference		inference
model		model
scripts		scripts
tokenizer		tokenizer
training		training
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

suriya-prakash-murugan/MiniGPT-Valyrian

Folders and files

Latest commit

History

Repository files navigation

🐉 High Valyrian MiniGPT — Build GPT Language Model From Scratch

📖 Project Description

🛠️ Tools and Libraries Used

🏛️ Model Architecture

➡️ Components Implemented:

🧐 Model Hyperparameters:

🔥 How This Differs From GPT-2:

🌟 Key Features

📈 WandB Training Logs

🚀 How to Run the Project

📜 Lessons Learned

✨ Final Thoughts

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages