🧠 Awesome LLM Papers

Read what matters. Skip the noise. 🎯

Today's Pick • This Week • Hall of Fame • Categories • Contribute

🔥 Today's Pick

🏆 The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models

Authors: Parshin Shojaee, Iman Mirzadeh, Keivan Alizadeh, Maxwell Horton, Samy Bengio, Mehrdad Farajtabar • Apple Machine Learning Research

Why this matters: This groundbreaking paper systematically exposes fundamental limitations of current "reasoning" models through controllable puzzle environments. It reveals that LRMs face complete accuracy collapse beyond certain complexities and paradoxically reduce reasoning effort as problems get harder..

Key Innovations:

🔸 Controllable puzzle environments for systematic complexity manipulation
🔸 Three performance regimes identified: low (standard models win), medium (LRMs excel), high (both fail)
🔸 Counter-intuitive scaling discovery: reasoning effort decreases with problem complexity despite token budget
🔸 Rigorous methodology avoiding data contamination issues of standard benchmarks

Resources:

Impact Score:

+ Performance: █████████░ 92%
+ Innovation:  ██████████ 98%
+ Practicality:████████░░ 85%

🔥 Trending Topics

"Hot research areas this month - Where the field is moving"

Topic	Papers	Why It's Trending
🚀 Mixture of Experts (MoE)	8 papers	Efficient scaling beyond dense models - Mixtral, DeepSeek success
💡 Million Token Context	6 papers	Breaking the context barrier - full books & codebases in one prompt
🤖 Autonomous Agents	12 papers	From chatbots to actual workers - AutoGPT evolution
🔄 Self-Improving Models	5 papers	Models that enhance themselves without human intervention

📆 This Week's Essential Reads

Click to expand this week's papers (January 13-19, 2025)

Day	Paper	Impact	TL;DR
Mon	🧮 🧠 Self-Questioning Language Models	`reasoning` `self-improvement`	LLMs improve without external data by generating and solving their own questions through asymmetric self-play
Tue	⚡ R-Zero: Self-Evolving Reasoning LLM from Zero Data	`autonomous-learning` `reasoning`	Challenger-Solver framework boosts Qwen3-4B by +6.49 points on math benchmarks without any human data
Wed	⚡ Mercury: Ultra-Fast Language Models Based on Diffusion	`speed` `inference`	Maintains character consistency across multiple generations
Thu	🤝 Multi-Agent Collaboration Mechanisms: A Survey of LLMs	`collaboration` `agents`	Comprehensive framework for LLM-based multi-agent systems - covers cooperation, competition, and coordination protocols
Fri	Tracing the thoughts of a large language model	`research` `testing`	We investigate the internal mechanisms used by Claude 3.5 Haiku — Anthropic's lightweight production model

📚 Must-Read Papers (Hall of Fame)

🏛️ Papers that fundamentally changed the field

Paper	Impact	Why Essential	Resources
Attention Is All You Need Vaswani et al., 2017	🏆 Foundational `architecture`	Created the Transformer architecture that powers all modern LLMs. Replaced RNNs with self-attention, enabling parallelization and scaling.	📄 💻 📊
GPT-3: Language Models are Few-Shot Learners Brown et al., 2020	🚀 Scale `emergence`	Proved that scale leads to emergent abilities. In-context learning without fine-tuning revolutionized how we use LLMs.	📄 🔍 📊
Constitutional AI: Harmlessness from AI Feedback Bai et al., 2022	🛡️ Safety `alignment`	Introduced RLAIF - training AI systems to be helpful and harmless using AI feedback instead of human feedback.	📄 💻 🎥
Chain-of-Thought Prompting Wei et al., 2022	🧮 Reasoning `prompting`	Simple prompting technique that dramatically improves reasoning by asking models to think step-by-step.	📄 💡 📊
RLHF: Training with Human Feedback Christiano et al., 2017	🎯 Alignment `training`	The technique behind ChatGPT's success. Aligns model outputs with human preferences through reinforcement learning.	📄 💻 📚

View all foundational papers →

📚 Browse by Category

🏗️ Model Architectures Transformers, SSMs, MoE, Novel designs 📄 10 papers \| 🔥	🧮 Reasoning & Agents CoT, Planning, Tool use, Autonomous systems 📄 6 papers \| 🔥🔥
⚡ Efficiency & Scaling Quantization, Pruning, Fast inference 📄 7 papers \| 🔥	🎯 Training & Alignment RLHF, DPO, Fine-tuning, PEFT methods 📄 5 papers \| →
🎨 Multimodal Models Vision-Language, Audio, Video, Any-to-any 📄 8 papers \| 🔥	📚 RAG & Knowledge Retrieval systems, Long context, Memory 📄 8 papers \| 🔥🔥🔥
🛡️ Safety & Security Jailbreaks, Alig nment, Robustness, Ethics 📄 4 papers \| →	🔬 Analysis & Theory Interpretability, Mechanistic, Evaluations 📄 3 papers \| →

🏗️ Model Architectures

View papers (10 total) • 🔥 Hot area

Latest Additions:

🆕 Mamba-2: Improved State Space Models - 5x faster than Transformers at 100K+ context
🆕 Mixture of Depths: Dynamic Layer Selection - Skip layers adaptively based on input
Retentive Networks: Retention Replaces Attention - O(1) memory complexity breakthrough

Foundational Papers:

🏆 Attention Is All You Need - The paper that started everything
BERT: Bidirectional Transformers - Revolutionized NLP pre-training
GPT-3: Few-Shot Learning - Proved scale leads to emergence

Tags: transformer state-space mixture-of-experts attention-mechanisms

View all architecture papers →

🧮 Reasoning & Agents

View papers (6 total) • 🔥🔥 Very hot area

Agent Systems:

🔥 AutoGPT-4: Fully Autonomous Agents - Complete tasks without human intervention
🔥 Reflexion: Self-Reflecting Agents - Learn from mistakes autonomously
ReAct: Reasoning and Acting - Combines reasoning with action execution

Reasoning Methods:

🏆 Chain-of-Thought Prompting - Simple technique for complex reasoning
Tree of Thoughts - Explore multiple reasoning paths
Graph of Thoughts - Non-linear reasoning structures

Tags: agents chain-of-thought planning tool-use reasoning

View all reasoning papers →

⚡ Efficiency & Scaling

View papers (7 total) • 🔥 Hot area

Inference Optimization:

🆕 Speculative Decoding 2.0 - 3x faster inference without quality loss
Flash-Decoding: Faster Attention - Optimized attention for inference
vLLM: PagedAttention - 24x throughput improvement

Model Compression:

🏆 QLoRA: 4-bit Quantization - Fine-tune 65B models on single GPU
GPTQ: Accurate Quantization - 3-4 bit quantization with minimal loss
SparseGPT: 50% Sparsity - Remove half the weights, keep performance

Tags: quantization pruning distillation inference deployment

View all efficiency papers →

🎯 Training & Alignment

View papers (5 total)

Alignment Methods:

🏆 RLHF: Human Feedback Training - The technique behind ChatGPT
🆕 DPO: Direct Preference Optimization - Simpler alternative to RLHF
Constitutional AI - AI feedback for harmless assistants

Fine-tuning Techniques:

LoRA: Low-Rank Adaptation - Parameter-efficient fine-tuning
Prefix Tuning - Tune only prefix parameters
Instruction Tuning - Teaching models to follow instructions

Tags: rlhf fine-tuning peft instruction-tuning alignment

View all training papers →

🎨 Multimodal Models

View papers (8 total) • 🔥 Hot area

Vision-Language:

🆕 GPT-4V: Visual Understanding - State-of-the-art visual reasoning
CLIP: Contrastive Vision-Language - Foundation for modern multimodal
LLaVA: Visual Instruction Tuning - Open-source GPT-4V alternative

Generation:

DALL-E 3: Text-to-Image - Photorealistic generation
Stable Diffusion 3 - Open-source image generation
Sora: Text-to-Video - Minute-long video generation

Tags: vision-language image-generation video audio multimodal

View all multimodal papers →

📚 RAG & Knowledge

View papers (4 total) • 🔥🔥🔥 Hottest area

RAG Systems:

🔥 RAG 2.0: Self-Reasoning Retrieval - RAG systems that think
🔥 RAPTOR: Recursive Abstractive Processing - Tree-based retrieval
Self-RAG: Self-Reflection - Adaptive retrieval and generation

Long Context:

🆕 RingAttention: Million Token Context - Process entire books
LongLoRA: Efficient Long Context - Extend context to 100k+
StreamingLLM: Infinite Context - Never-ending conversations

Tags: retrieval rag long-context memory knowledge-bases

View all RAG papers →

🛡️ Safety & Security

View papers (4 total)

Safety Research:

🚨 Universal Jailbreaks - Attacks that work on all models
Circuit Breakers - Built-in safety mechanisms
Representation Engineering - Control model behavior directly

Alignment:

Sleeper Agents - Hidden model behaviors
Scalable Oversight - Supervising superhuman AI
Debate as Alignment - Using AI debate for safety

Tags: jailbreaks alignment safety robustness red-teaming

View all safety papers →

🔬 Analysis & Theory

View papers (3 total)

Language Models are Few-Shot Learners (Analysis) - (2020) Scaling laws and emergent abilities
Emergent Abilities of Large Language Models - (2022) Investigating non-linear scaling behavior
A Mathematical Framework for Transformer Circuits - (2021) Reverse engineer several toy, attention-only models by Anthropic

Tags: interpretability mechanistic theory analysis evaluation

View all analysis papers →

🏷️ Explore by Tags

📊 Most Used Tags

transformer (67) • efficient (45) • reasoning (38) • open-source (35) • multimodal (28)

production-ready (25) • breakthrough (22) • sota (20) • chain-of-thought (18) • rlhf (15)

🎯 Quick Filters

By Impact: 🏆 breakthrough • ⭐ sota • 🚀 production-ready • 🧪 experimental

By Org: 🟠 openai • 🔷 anthropic • 🔴 google • 🔵 meta • 🎓 academic

By Size: <1B • 1B-7B • 7B-30B • 30B+

📈 Research Trends Dashboard

%%{init: {'theme':'dark'}}%%
graph TD
    A[2024 Q4] -->|RAG Revolution| B[2025 Q1]
    B -->|Agents & Tools| C[2025 Q2]
    B -->|1M+ Context| D[2025 Q2]
    B -->|MoE Everything| E[2025 Q2]
    
    style A fill:#1f2937
    style B fill:#374151
    style C fill:#4b5563
    style D fill:#6b7280
    style E fill:#9ca3af

This Month's Momentum:

📈 Rising: RAG Systems (+450%), Autonomous Agents (+320%), MoE Models (+280%)
📉 Cooling: Basic Prompting (-60%), Small Models (-40%)
🔮 Next Wave: Self-improving models, Multimodal reasoning, Edge deployment
Current Hot Topics: 🔥 Long Context (>1M tokens) | 🔥 Reasoning without CoT | 🔥 Efficient Fine-tuning

🤝 Contributing

Add a paper in 30 seconds!

Just need: Paper link + One sentence on why it matters

✅ Reviewed within 24 hours | 🏆 Contributors get credit | 💬 Join discussions

⬆ Back to Top

Made with ❤️ for the AI Research Community

Last updated: August 10, 2025, 9:00 AM IST

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github		.github
categories		categories
scripts		scripts
summaries		summaries
.gitignore		.gitignore
Contributing.md		Contributing.md
LICENSE		LICENSE
README.md		README.md
TAGGING_GUIDE.md		TAGGING_GUIDE.md

🏗️ Model Architectures Transformers, SSMs, MoE, Novel designs 📄 10 papers \| 🔥	🧮 Reasoning & Agents CoT, Planning, Tool use, Autonomous systems 📄 6 papers \| 🔥🔥
⚡ Efficiency & Scaling Quantization, Pruning, Fast inference 📄 7 papers \| 🔥	🎯 Training & Alignment RLHF, DPO, Fine-tuning, PEFT methods 📄 5 papers \| →
🎨 Multimodal Models Vision-Language, Audio, Video, Any-to-any 📄 8 papers \| 🔥	📚 RAG & Knowledge Retrieval systems, Long context, Memory 📄 8 papers \| 🔥🔥🔥
🛡️ Safety & Security Jailbreaks, Alig nment, Robustness, Ethics 📄 4 papers \| →	🔬 Analysis & Theory Interpretability, Mechanistic, Evaluations 📄 3 papers \| →

License

puneet-chandna/awesome-LLM-papers

Folders and files

Latest commit

History