Implementation of Rotary Positional Embeddings (RoPE) — the clever trick behind positional encoding in modern transformers like LLaMA, Qwen, and GPT-NeoX.
# Clone the repo
git clone https://github.com/sakhileln/rope-pytorch.git
cd rope-pytorch
rope-pytorch/
├── README.md # Intro + explanation
├── rope.py # Core RoPE functions
├── transformer.py # Tiny Transformer with RoPE
├── visualize_rope.py # 2D/3D visualizations of rotation
├── train_demo.py # Train on toy data (e.g., copy task)
└── requirements.txt
- Support for NTK scaling (used in LLaMA 2 for longer context).
- Benchmark: Compare perplexity with/without RoPE on a small language modeling dataset (like TinyShakespeare).
- A notebook version for Google Colab.
- RoPE: Rotary Positional Embeddings
- LLaMA: Pre-training Text Encoders as Discrete Transformers
- Qwen: A Simple and Efficient Transformer for Language Modeling
- GPT-NeoX: Improving Language Understanding by Generative Pre-Training
- The Illustrated Transformer
- The Annotated Transformer
- Rotary Positional Embeddings
- An In-depth exploration of Rotary Position Embedding (RoPE)
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- Sakhile L. Ndlazi
- LinkedIn Profile