Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
README.md		README.md

Repository files navigation

Learning-Resources

A compilation of resources for keeping up with the latest trends in NLP.

Note: This resource list is a work in progress. More papers and topics will be added regularly. Contributions and suggestions are welcome!

Some Fundamental Transformers

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding - https://arxiv.org/abs/1810.04805
GPT1 - https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf
GPT2 - https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
T5: https://arxiv.org/abs/1910.10683
XLNet - https://arxiv.org/pdf/1906.08237
RoBERTa: https://arxiv.org/abs/1907.11692
ALBERT: https://arxiv.org/abs/1909.11942
LongFormer - https://arxiv.org/abs/2004.05150

Papers for understanding fundamentals

Attention is all you need - https://arxiv.org/pdf/1706.03762
Memory Is All You Need - https://arxiv.org/pdf/2406.08413
Language Models are Few-Shot Learners - https://arxiv.org/abs/2005.14165

Reinforcement Learning for LLMs

Basics of RL - OpenAI - https://spinningup.openai.com/en/latest/spinningup/rl_intro.html

Reinforcement Learning with Human Feedback: Learning Dynamic Choices via Pessimism - https://arxiv.org/abs/2305.18438

InstructGPT - https://arxiv.org/abs/2203.02155

DPO:

DPO paper: https://arxiv.org/pdf/2305.18290
Blog - Math behind DPO - https://www.tylerromero.com/posts/2024-04-dpo/

PPO:

Proximal Policy Optimization Algorithms - https://arxiv.org/pdf/1707.06347
PPO Docs OpenAI - https://spinningup.openai.com/en/latest/algorithms/ppo.html

GRPO:

DeepSeekMath - https://arxiv.org/abs/2402.03300
Blog - GRPO Explained - https://aipapersacademy.com/deepseekmath-grpo/
DeepSeek-R1 - https://arxiv.org/pdf/2501.12948

Mechanistic Interpretability

Basic Mech Interp Essay - https://www.transformer-circuits.pub/2022/mech-interp-essay
Toy Neural Nets with low dimensional inputs - https://colah.github.io/posts/2014-03-NN-Manifolds-Topology/
Mechanistic Interpretability for AI Safety Review - https://arxiv.org/abs/2404.14082
A Mathematical Framework for Transformer Circuits - https://transformer-circuits.pub/2021/framework/index.html
Circuit Tracing: Revealing Computational Graphs in Language Models - https://transformer-circuits.pub/2025/attribution-graphs/methods.html#evaluating-model

Scaling Laws

Scaling Laws for Neural Language Models - https://arxiv.org/pdf/2001.08361
Scaling Laws for Autoregressive Generative Modeling - https://arxiv.org/pdf/2010.14701
Sacling Laws of Synthetic Data for Lnguage Models - https://arxiv.org/pdf/2503.19551

MLSys

Matrix multiplication - Nvidia Blog - https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html
Understanding GPU Performance - Nvidia Blog - https://docs.nvidia.com/deeplearning/performance/dl-performance-gpu-background/index.html#gpu-arch__fig2