A hands-on collection of Jupyter notebooks exploring key concepts, architectures, and optimizations in Large Language Models (LLMs).
- Build a basic LLM using PyTorch.
- Covers data preprocessing, model architecture, training loop, and inference.
- Enhances the previous notebook with modern techniques:
- RMSNorm
- Gated (SwiGLU) FFN
- Rotary Positional Embeddings (RoPE)
- Includes gradient accumulation and mixed precision training.
- Deep dive into RoPE in PyTorch.
- Compares different RoPE implementations.