A compilation of resources for keeping up with the latest trends in NLP.
Note: This resource list is a work in progress. More papers and topics will be added regularly. Contributions and suggestions are welcome!
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding - https://arxiv.org/abs/1810.04805
- GPT1 - https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf
- GPT2 - https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
- T5: https://arxiv.org/abs/1910.10683
- XLNet - https://arxiv.org/pdf/1906.08237
- RoBERTa: https://arxiv.org/abs/1907.11692
- ALBERT: https://arxiv.org/abs/1909.11942
- LongFormer - https://arxiv.org/abs/2004.05150
- Attention is all you need - https://arxiv.org/pdf/1706.03762
- Memory Is All You Need - https://arxiv.org/pdf/2406.08413
- Language Models are Few-Shot Learners - https://arxiv.org/abs/2005.14165
Basics of RL - OpenAI - https://spinningup.openai.com/en/latest/spinningup/rl_intro.html
Reinforcement Learning with Human Feedback: Learning Dynamic Choices via Pessimism - https://arxiv.org/abs/2305.18438
InstructGPT - https://arxiv.org/abs/2203.02155
DPO:
- DPO paper: https://arxiv.org/pdf/2305.18290
- Blog - Math behind DPO - https://www.tylerromero.com/posts/2024-04-dpo/
PPO:
- Proximal Policy Optimization Algorithms - https://arxiv.org/pdf/1707.06347
- PPO Docs OpenAI - https://spinningup.openai.com/en/latest/algorithms/ppo.html
GRPO:
- DeepSeekMath - https://arxiv.org/abs/2402.03300
- Blog - GRPO Explained - https://aipapersacademy.com/deepseekmath-grpo/
- DeepSeek-R1 - https://arxiv.org/pdf/2501.12948
- Basic Mech Interp Essay - https://www.transformer-circuits.pub/2022/mech-interp-essay
- Toy Neural Nets with low dimensional inputs - https://colah.github.io/posts/2014-03-NN-Manifolds-Topology/
- Mechanistic Interpretability for AI Safety Review - https://arxiv.org/abs/2404.14082
- A Mathematical Framework for Transformer Circuits - https://transformer-circuits.pub/2021/framework/index.html
- Circuit Tracing: Revealing Computational Graphs in Language Models - https://transformer-circuits.pub/2025/attribution-graphs/methods.html#evaluating-model
- Scaling Laws for Neural Language Models - https://arxiv.org/pdf/2001.08361
- Scaling Laws for Autoregressive Generative Modeling - https://arxiv.org/pdf/2010.14701
- Sacling Laws of Synthetic Data for Lnguage Models - https://arxiv.org/pdf/2503.19551
- Matrix multiplication - Nvidia Blog - https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html
- Understanding GPU Performance - Nvidia Blog - https://docs.nvidia.com/deeplearning/performance/dl-performance-gpu-background/index.html#gpu-arch__fig2