Skip to content

๐Ÿ”ฅ Comprehensive survey on Context Engineering: from prompt engineering to production-grade AI systems. hundreds of papers, frameworks, and implementation guides for LLMs and AI agents.

License

Notifications You must be signed in to change notification settings

zhouchenlin2096/Awesome-Context-Engineering

ย 
ย 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

4 Commits
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Awesome Context Engineering

Awesome Context Engineering Cover

Awesome License: MIT PRs Welcome Paper

๐Ÿ“„ Our comprehensive survey paper on Context Engineering is coming soon! Stay tuned for the latest academic insights and theoretical foundations.

A comprehensive survey and collection of resources on Context Engineering - the evolution from static prompting to dynamic, context-aware AI systems.

โš ๏ธ Disclaimer

This project is ongoing and continuously evolving. While we strive for accuracy and completeness, there may be errors, omissions, or outdated information. We welcome corrections, suggestions, and contributions from the community. Please stay tuned for regular updates and improvements.


๐Ÿ“ฐ News

  • [2025.7] Repository initialized with comprehensive outline
  • [2025.7] Survey structure established following modern context engineering paradigms

๐ŸŽฏ Introduction

In the era of Large Language Models (LLMs), the limitations of static prompting have become increasingly apparent. Context Engineering represents the natural evolution to address LLM uncertainty and achieve production-grade AI deployment. Unlike traditional prompt engineering, context engineering encompasses the complete information payload provided to LLMs at inference time, including all structured informational components necessary for plausible task completion.

This repository serves as a comprehensive survey of context engineering techniques, methodologies, and applications.


๐Ÿ“š Table of Contents


๐Ÿ”— Related Survey

General AI Survey Papers

  • A Survey of Large Language Models, Zhao et al., arXiv Badge
  • The Prompt Report: A Systematic Survey of Prompt Engineering Techniques, Schulhoff et al., arXiv Badge
  • A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications, Sahoo et al., arXiv Badge
  • A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models, Gao et al., arXiv Badge

Context and Reasoning

  • A Survey on In-context Learning, Dong et al., EMNLP Badge
  • The Mystery of In-Context Learning: A Comprehensive Survey on Interpretation and Analysis, Zhou et al., arXiv Badge
  • A Comprehensive Survey of Retrieval-Augmented Generation (RAG): Evolution, Current Landscape and Future Directions, Gupta et al., arXiv Badge
  • Retrieval-Augmented Generation for Large Language Models: A Survey, Gao et al., arXiv Badge
  • A Survey on Knowledge-Oriented Retrieval-Augmented Generation, Cheng et al., arXiv Badge

Memory Systems and Context Persistence

  • A Survey on the Memory Mechanism of Large Language Model based Agents, Zhang et al., arXiv Badge
  • From Human Memory to AI Memory: A Survey on Memory Mechanisms in the Era of LLMs, Wu et al., arXiv Badge

Foundational Survey Papers from Major Venues

  • AUTOPROMPT: Eliciting Knowledge from Language Models with Automatically Generated Prompts, Shin et al., EMNLP Badge
  • The Power of Scale for Parameter-Efficient Prompt Tuning, Lester et al., EMNLP Badge
  • Prefix-Tuning: Optimizing Continuous Prompts for Generation, Li et al., ACL Badge
  • An Explanation of In-context Learning as Implicit Bayesian Inference, Xie et al., ICLR Badge
  • Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?, Min et al., EMNLP Badge

Additional RAG and Retrieval Surveys

  • Retrieval-Augmented Generation for AI-Generated Content: A Survey, Various, arXiv Badge
  • Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey on How to Make your LLMs use External Data More Wisely, Various, arXiv Badge
  • Large language models (LLMs): survey, technical frameworks, and future challenges, Various, AIR Badge

๐Ÿ—๏ธ Definition of Context Engineering

Context is not just the single prompt users send to an LLM. Context is the complete information payload provided to a LLM at inference time, encompassing all structured informational components that the model needs to plausibly accomplish a given task.

LLM Generation

To formally define Context Engineering, we must first mathematically characterize the LLM generation process. Let us model an LLM as a probabilistic function:

$$P(\text{output} | \text{context}) = \prod_{t=1}^T P(\text{token}_t | \text{previous tokens}, \text{context})$$

Where:

  • $\text{context}$ represents the complete input information provided to the LLM
  • $\text{output}$ represents the generated response sequence
  • $P(\text{token}_t | \text{previous tokens}, \text{context})$ is the probability of generating each token given the context

Definition of Context

In traditional prompt engineering, the context is treated as a simple string: $$\text{context} = \text{prompt}$$

However, in Context Engineering, we decompose the context into multiple structured components:

$$\text{context} = \text{Assemble}(\text{instructions}, \text{knowledge}, \text{tools}, \text{memory}, \text{state}, \text{query})$$

Where $\text{Assemble}$ is a context assembly function that orchestrates:

  • $\text{instructions}$: System prompts and rules
  • $\text{knowledge}$: Retrieved relevant information
  • $\text{tools}$: Available function definitions
  • $\text{memory}$: Conversation history and learned facts
  • $\text{state}$: Current world/user state
  • $\text{query}$: User's immediate request

Definition of Context Engineering

Context Engineering is formally defined as the optimization problem:

$$\text{Assemble}^* = \arg\max_{\text{Assemble}} \mathbb{E} [\text{Reward}(\text{LLM}(\text{context}), \text{target})]$$

Subject to constraints:

  • $|\text{context}| \leq \text{MaxTokens}$ (context window limitation)
  • $\text{knowledge} = \text{Retrieve}(\text{query}, \text{database})$
  • $\text{memory} = \text{Select}(\text{history}, \text{query})$
  • $\text{state} = \text{Extract}(\text{world})$

Where:

  • $\text{Reward}$ measures the quality of generated responses
  • $\text{Retrieve}$, $\text{Select}$, $\text{Extract}$ are functions for information gathering

Dynamic Context Orchestration

The context assembly can be decomposed as:

$$\text{context} = \text{Concat}(\text{Format}(\text{instructions}), \text{Format}(\text{knowledge}), \text{Format}(\text{tools}), \text{Format}(\text{memory}), \text{Format}(\text{query}))$$

Where $\text{Format}$ represents component-specific structuring, and $\text{Concat}$ assembles them respecting token limits and optimal positioning.

Context Engineering is therefore the discipline of designing and optimizing these assembly and formatting functions to maximize task performance.

Mathematical Principles

From this formalization, we derive four fundamental principles:

  1. System-Level Optimization: Context generation is a multi-objective optimization problem over assembly functions, not simple string manipulation.

  2. Dynamic Adaptation: The context assembly function adapts to each $\text{query}$ and $\text{state}$ at inference time: $\text{Assemble}(\cdot | \text{query}, \text{state})$.

  3. Information-Theoretic Optimality: The retrieval function maximizes relevant information: $\text{Retrieve} = \arg\max \text{Relevance}(\text{knowledge}, \text{query})$.

  4. Structural Sensitivity: The formatting functions encode structure that aligns with LLM processing capabilities.

Theoretical Framework: Bayesian Context Inference

Context Engineering can be formalized within a Bayesian framework where the optimal context is inferred:

$$P(\text{context} | \text{query}, \text{history}, \text{world}) \propto P(\text{query} | \text{context}) \cdot P(\text{context} | \text{history}, \text{world})$$

Where:

  • $P(\text{query} | \text{context})$ models query-context compatibility
  • $P(\text{context} | \text{history}, \text{world})$ represents prior context probability

The optimal context assembly becomes:

$$\text{context}^* = \arg\max_{\text{context}} P(\text{answer} | \text{query}, \text{context}) \cdot P(\text{context} | \text{query}, \text{history}, \text{world})$$

This Bayesian formulation enables:

  • Uncertainty Quantification: Modeling confidence in context relevance
  • Adaptive Retrieval: Updating context beliefs based on feedback
  • Multi-step Reasoning: Maintaining context distributions across interactions

Mathematical Comparison

Dimension Prompt Engineering Context Engineering
Mathematical Model $\text{context} = \text{prompt}$ (static) $\text{context} = \text{Assemble}(...)$ (dynamic)
Optimization Target $\arg\max_{\text{prompt}} P(\text{answer}$ $\text{query}, \text{prompt})$
Complexity $O(1)$ context assembly $O(n)$ multi-component optimization
Information Theory Fixed information content Adaptive information maximization
State Management Stateless function Stateful with $\text{memory}(\text{history}, \text{query})$
Scalability Linear in prompt length Sublinear through compression/filtering
Error Analysis Manual prompt inspection Systematic evaluation of assembly components

๐Ÿค” Why Context Engineering?

The Paradigm Shift: From Tactical to Strategic

The evolution from prompt engineering to context engineering represents a fundamental maturation in AI system design. As influential figures like Andrej Karpathy, Tobi Lutke, and Simon Willison have argued, the term "prompt engineering" has been diluted to mean simply "typing things into a chatbot," failing to capture the complexity required for industrial-strength LLM applications.

1. Enterprise and Production Necessities

Context Failures Are the New Bottleneck

Most failures in modern agentic systems are no longer attributable to core model reasoning capabilities but are instead "context failures". The true engineering challenge lies not in what question to ask, but in ensuring the model has all necessary background, data, tools, and memory to answer meaningfully and reliably.

Scalability Beyond Simple Tasks

While prompt engineering suffices for simple, self-contained tasks, it breaks down when scaled to:

  • Complex, multi-step applications
  • Data-rich enterprise environments
  • Stateful, long-running workflows
  • Multi-user, multi-tenant systems

Context Engineering provides the architectural foundation for managing state, integrating diverse data sources, and maintaining coherence across these demanding scenarios.

2. The Limitations of Static Prompting

From Strings to Systems

Traditional prompting treats context as a static string, but enterprise applications require:

  • Dynamic Information Assembly: Context created on-the-fly, tailored to specific users and queries
  • Multi-Source Integration: Combining databases, APIs, documents, and real-time data
  • State Management: Maintaining conversation history, user preferences, and workflow status
  • Tool Orchestration: Coordinating external function calls and API interactions

The "Movie Production" Analogy

If prompt engineering is writing a single line of dialogue for an actor, context engineering is the entire process of building the set, designing lighting, providing detailed backstory, and directing the scene. The dialogue only achieves its intended impact because of the rich, carefully constructed environment surrounding it.

3. Cognitive and Information Science Foundations

Artificial Embodiment

LLMs are essentially "brains in a vat" - powerful reasoning engines lacking connection to specific environments. Context Engineering provides:

  • Synthetic Sensory Systems: Retrieval mechanisms as artificial perception
  • Proxy Embodiment: Tool use as artificial action capabilities
  • Artificial Memory: Structured information storage and retrieval

Information Retrieval at Scale

Context Engineering addresses the fundamental challenge of information retrieval where the "user" is not human but an AI agent. This requires:

  • Semantic Understanding: Bridging the gap between intent and expression
  • Relevance Optimization: Ranking and filtering vast knowledge bases
  • Query Transformation: Converting ambiguous requests into precise retrieval operations

4. Production-Grade Requirements

Reliability and Consistency

Enterprise applications demand:

  • Deterministic Behavior: Predictable outputs across different contexts and users
  • Error Handling: Graceful degradation when information is incomplete or contradictory
  • Audit Trails: Transparency in how context influences model decisions
  • Compliance: Meeting regulatory requirements for data handling and decision making

Economic and Operational Efficiency

Context Engineering enables:

  • Cost Optimization: Strategic choice between RAG and long-context approaches
  • Latency Management: Efficient information retrieval and context assembly
  • Resource Utilization: Optimal use of finite context windows and computational resources
  • Maintenance Scalability: Systematic approaches to updating and managing knowledge bases

5. The Future of AI System Architecture

Context Engineering elevates AI development from a collection of "prompting tricks" to a rigorous discipline of systems architecture. It applies decades of knowledge in operating system design, memory management, and distributed systems to the unique challenges of LLM-based applications.

This discipline is foundational for unlocking the full potential of LLMs in production systems, enabling the transition from one-off text generation to autonomous agents and sophisticated AI copilots that can reliably operate in complex, dynamic environments.


๐Ÿ”ง Contextual Components, Techniques and Architectures

Context Scaling

Position Interpolation and Extension Techniques

  • Extending Context Window of Large Language Models via Position Interpolation, Chen et al., arXiv Badge
  • YaRN: Efficient Context Window Extension of Large Language Models, Peng et al., ICLR Badge
  • LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens, Ding et al., ICML Badge
  • LongRoPE2: Near-Lossless LLM Context Window Scaling, Shang et al., ICML Badge

Memory-Efficient Attention Mechanisms

  • Fast Multipole Attention: A Divide-and-Conquer Attention Mechanism for Long Sequences, Kang et al., ICLR Badge
  • Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention, Munkhdalai et al., arXiv Badge
  • DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads, Xiao et al., ICLR Badge
  • Star Attention: Efficient LLM Inference over Long Sequences, Acharya et al., arXiv Badge

Ultra-Long Sequence Processing (100K+ Tokens)

  • TokenSwift: Lossless Acceleration of Ultra Long Sequence Generation, Wu et al., ICML Badge
  • LongHeads: Multi-Head Attention is Secretly a Long Context Processor, Lu et al., EMNLP Badge
  • โˆžBench: Extending Long Context Evaluation Beyond 100K Tokens, Bai et al., ACL Badge

Comprehensive Extension Surveys and Methods

  • Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models, Various, arXiv Badge
  • A Controlled Study on Long Context Extension and Generalization in LLMs, Various, arXiv Badge
  • Selective Attention: Enhancing Transformer through Principled Context Control, Various, NeurIPS Badge

Vision-Language Models with Sophisticated Context Understanding

  • Towards LLM-Centric Multimodal Fusion: A Survey on Integration Strategies and Techniques, An et al., arXiv Badge
  • Comprehending Multimodal Content via Prior-LLM Context Fusion, Wang et al., ACL Badge
  • V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding, Dai et al., arXiv Badge
  • Flamingo: a Visual Language Model for Few-Shot Learning, Alayrac et al., NeurIPS Badge

Audio-Visual Context Integration and Processing

  • Aligned Better, Listen Better for Audio-Visual Large Language Models, Guo et al., ICLR Badge
  • AVicuna: Audio-Visual LLM with Interleaver and Context-Boundary Alignment for Temporal Referential Dialogue, Chen et al., arXiv Badge
  • SonicVisionLM: Playing Sound with Vision Language Models, Xie et al., CVPR Badge
  • SAVEn-Vid: Synergistic Audio-Visual Integration for Enhanced Understanding in Long Video Context, Li et al., arXiv Badge

Multi-Modal Prompt Engineering and Context Design

  • CaMML: Context-Aware Multimodal Learner for Large Models, Chen et al., ACL Badge
  • Visual In-Context Learning for Large Vision-Language Models, Zhou et al., ACL Badge
  • CAMA: Enhancing Multimodal In-Context Learning with Context-Aware Modulated Attention, Li et al., arXiv Badge

CVPR 2024 Vision-Language Advances

  • CogAgent: A Visual Language Model for GUI Agents, Various, CVPR Badge
  • LISA: Reasoning Segmentation via Large Language Model, Various, CVPR Badge
  • Reproducible scaling laws for contrastive language-image learning, Various, CVPR Badge

Video and Temporal Understanding

  • Video Understanding with Large Language Models: A Survey, Various, arXiv Badge

Structured Data Integration

Knowledge Graph-Enhanced Language Models

  • Learn Together: Joint Multitask Finetuning of Pretrained KG-enhanced LLM for Downstream Tasks, Martynova et al., ICCL Badge
  • Knowledge Graph Tuning: Real-time Large Language Model Personalization based on Human Feedback, Sun et al., ICLR Badge
  • Knowledge Graph-Guided Retrieval Augmented Generation, Zhu et al., arXiv Badge
  • KGLA: Knowledge Graph Enhanced Language Agents for Customer Service, Anonymous et al., arXiv Badge

Graph Neural Networks Combined with Language Models

  • Are Large Language Models In-Context Graph Learners?, Li et al., arXiv Badge
  • Let's Ask GNN: Empowering Large Language Model for Graph In-Context Learning, Hu et al., EMNLP Badge
  • GL-Fusion: Rethinking the Combination of Graph Neural Network and Large Language model, Yang et al., ICLR Badge
  • NT-LLM: A Novel Node Tokenizer for Integrating Graph Structure into Large Language Models, Ji et al., arXiv Badge

Structured Data Integration

  • CoddLLM: Empowering Large Language Models for Data Analytics, Authors et al., arXiv Badge
  • Structure-Guided Large Language Models for Text-to-SQL Generation, Authors et al., arXiv Badge
  • StructuredRAG: JSON Response Formatting with Large Language Models, Authors et al., arXiv Badge

Foundational KG-LLM Integration Methods

  • Unifying Large Language Models and Knowledge Graphs: A Roadmap, Various, arXiv Badge
  • Combining Knowledge Graphs and Large Language Models, Various, arXiv Badge
  • All Against Some: Efficient Integration of Large Language Models for Message Passing in Graph Neural Networks, Various, arXiv Badge
  • Large Language Models for Graph Learning, Various, WWW Badge

Self-Generated Context

Self-Supervised Context Generation and Augmentation

  • SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models, Chuang et al., arXiv Badge
  • Self-Supervised Prompt Optimization, Xiang et al., CoRR Badge
  • SCOPE: A Self-supervised Framework for Improving Faithfulness in Conditional Text Generation, Duong et al., ICLR Badge

Reasoning Models That Generate Their Own Context

  • Self-Consistency Improves Chain of Thought Reasoning in Language Models, Wang et al., ICLR Badge
  • Tree of Thoughts: Deliberate Problem Solving with Large Language Models, Yao et al., arXiv Badge
  • Rethinking Chain-of-Thought from the Perspective of Self-Training, Wu et al., arXiv Badge
  • Autonomous Tree-search Ability of Large Language Models, Authors et al., arXiv Badge

Iterative Context Refinement and Self-Improvement

  • Self-Refine: Iterative Refinement with Self-Feedback, Madaan et al., arXiv Badge
  • Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning, Authors et al., arXiv Badge
  • Large Language Models Can Self-Improve in Long-context Reasoning, Li et al., arXiv Badge

Meta-Learning and Autonomous Context Evolution

  • Meta-in-context learning in large language models, Coda-Forno et al., NeurIPS Badge
  • EvoPrompt: Connecting LLMs with Evolutionary Algorithms Yields Powerful Prompt Optimizers, Guo et al., ICLR Badge
  • AutoPDL: Automatic Prompt Optimization for LLM Agents, Spiess et al., AutoML Badge

Foundational Chain-of-Thought Research

  • Chain-of-thought prompting elicits reasoning in large language models, Wei et al., NeurIPS Badge

๐Ÿ› ๏ธ Implementation, Challenges, and Mitigation Strategies

1. Retrieval-Augmented Generation (RAG)

Foundational RAG Systems

  • Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, Lewis et al., arXiv Badge
  • A Survey on Knowledge-Oriented Retrieval-Augmented Generation, Cheng et al., arXiv Badge
  • A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models, Ding et al., arXiv Badge

Graph-Based RAG Systems

  • GFM-RAG: Graph Foundation Model for Retrieval Augmented Generation, Luo et al., arXiv Badge
  • GRAG: Graph Retrieval-Augmented Generation, Hu et al., NAACL Badge
  • HybridRAG: A Hybrid Retrieval System for RAG Combining Vector and Graph Search, Sarabesh, GitHub Badge

Multi-Agent and Hierarchical RAG

  • HM-RAG: Hierarchical Multi-Agent Multimodal Retrieval Augmented Generation, Liu et al., arXiv Badge
  • MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries, Tang & Yang, arXiv Badge
  • MMOA-RAG: Improving Retrieval-Augmented Generation through Multi-Agent Reinforcement Learning, Chen et al., arXiv Badge

Real-Time and Streaming RAG

  • StreamingRAG: Real-time Contextual Retrieval and Generation Framework, Sankaradas et al., arXiv Badge
  • Multi-task Retriever Fine-tuning for Domain-Specific and Efficient RAG, Authors, arXiv Badge

2. Memory Systems

Persistent Memory Architecture

  • MemGPT: Towards LLMs as Operating Systems, Packer et al., arXiv Badge
  • Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory, Taranjeet et al., arXiv Badge
  • MemoryLLM: Towards Self-Updatable Large Language Models, Wang et al., arXiv Badge

Memory-Augmented Neural Networks

  • Survey on Memory-Augmented Neural Networks: Cognitive Insights to AI Applications, Khosla et al., arXiv Badge
  • A Machine with Short-Term, Episodic, and Semantic Memory Systems, Kim et al., arXiv Badge
  • From Human Memory to AI Memory: A Survey on Memory Mechanisms in the Era of LLMs, Wu et al., arXiv Badge

Episodic Memory and Context Persistence

  • The Role of Memory in LLMs: Persistent Context for Smarter Conversations, Porcu, IJSRM Badge
  • Episodic Memory in AI Agents Poses Risks that Should Be Studied and Mitigated, Christiano et al., arXiv Badge

3. Agent Communication

Agent Interoperability Protocols

  • A survey of agent interoperability protocols: Model Context Protocol (MCP), Agent Communication Protocol (ACP), and Agent-to-Agent Protocol (A2A), Zhang et al., arXiv Badge
  • Expressive Multi-Agent Communication via Identity-Aware Learning, Du et al., AAAI Badge
  • Context-aware Communication for Multi-agent Reinforcement Learning (CACOM), Li et al., arXiv Badge

Structured Communication Frameworks

  • Learning Structured Communication for Multi-Agent Reinforcement Learning, Wang et al., AAMAS Badge
  • AC2C: Adaptively Controlled Two-Hop Communication for Multi-Agent Reinforcement Learning, Wang et al., AAMAS Badge
  • Task-Agnostic Contrastive Pre-Training for Inter-Agent Communication, Sun et al., AAMAS Badge

LLM-Enhanced Agent Communication

  • ProAgent: Building Proactive Cooperative Agents with Large Language Models, Zhang et al., AAAI Badge
  • Model Context Protocol (MCP), Anthropic, GitHub Badge

4. Tool Use and Function Calling

Foundational Tool Learning

  • Toolformer: Language Models Can Teach Themselves to Use Tools, Schick et al., NeurIPS Badge
  • ReAct: Synergizing Reasoning and Acting in Language Models, Yao et al., arXiv Badge
  • Augmented Language Models: a Survey, Qin et al., arXiv Badge
  • Tool Learning with Large Language Models: A Survey, Qu et al., arXiv Badge

Advanced Function Calling Systems

  • Granite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning of Granular Tasks, Smith et al., arXiv Badge
  • HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face, Shen et al., NeurIPS Badge
  • Enhancing Function-Calling Capabilities in LLMs: Strategies for Prompt Formats, Data Integration, and Multilingual Translation, Chen et al., NAACL Badge

Multi-Agent Function Calling

  • ToolACE: Winning the Points of LLM Function Calling, Zhang et al., OpenReview Badge
  • Berkeley Function Leaderboard (BFCL): Evaluating Function-Calling Abilities, Various, Benchmark Badge

๐Ÿ“Š Evaluation Paradigms for Context-Driven Systems

Context Quality Assessment

Foundational Long-Context Benchmarks

  • RULER: What's the Real Context Size of Your Long-Context Language Models?, Cheng-Ping Hsieh et al., COLM Badge
  • LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding, Bai et al., ACL Badge
  • โˆžBENCH: Extending Long Context Evaluation Beyond 100K Tokens, Zhang et al., ACL Badge
  • VL-ICL Bench: The Devil in the Details of Multimodal In-Context Learning, Zong et al., ICLR Badge

Multimodal and Specialized Evaluation

  • MultiModal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models, Wang et al., NAACL Badge
  • Contextualized Topic Coherence (CTC) Metrics, Rahimi et al., ACL Badge
  • BBScore: A Brownian Bridge Based Metric for Assessing Text Coherence, Sheng et al., AAAI Badge

RAG and Generation Evaluation

  • Evaluation of Retrieval-Augmented Generation: A Survey, Li et al., arXiv Badge
  • Ragas: Automated Evaluation of Retrieval Augmented Generation, Espinosa-Anke et al., arXiv Badge
  • Human Evaluation Protocol for Generative AI Chatbots in Clinical Microbiology, Griego-Herrera et al., PLOS Badge

Benchmarking Context Engineering

Synthetic vs. Realistic Evaluation

  • Needle-in-a-Haystack (NIAH) and Synthetic Benchmarks, Research Area 2023-2024, Benchmark Badge
  • ZeroSCROLLS: Realistic Natural Language Tasks, Benchmark 2023-2024, Benchmark Badge
  • InfiniteBench: 100K+ Token Evaluation, Benchmark 2024, Benchmark Badge

๐Ÿš€ Applications and Systems

Complex Research Systems

Hypothesis Generation and Data-Driven Discovery

  • Hypothesis Generation with Large Language Models, Liu et al., arXiv Badge
  • GFlowNets for AI-Driven Scientific Discovery, Jain et al., Digital Discovery Badge
  • Literature Meets Data: A Synergistic Approach to Hypothesis Generation, Liu et al., arXiv Badge
  • Machine Learning for Hypothesis Generation in Biology and Medicine, FieldSHIFT Team, Digital Discovery Badge

Automated Scientific Discovery

  • The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery, Lu et al., arXiv Badge
  • Automating Psychological Hypothesis Generation with AI, Johnson et al., Nature Badge
  • Can Large Language Models Replace Humans in Systematic Reviews?, Khraisha et al., Research Synthesis Badge

AI for Science Integration and Future Directions

  • AI for Science 2025: Convergence of AI Innovation and Scientific Discovery, Fink et al., Nature Badge
  • Towards Scientific Discovery with Generative AI: Progress, Opportunities, and Challenges, Anonymous et al., arXiv Badge

Deep Research Applications

  • Accelerating scientific discovery with AI, MIT News, MIT Badge
  • Accelerating scientific breakthroughs with an AI co-scientist, Google Research, Google Badge
  • Bridging AI and Science: Implications from a Large-Scale Literature Analysis of AI4Science, Various, arXiv Badge
  • AI for scientific discovery, World Economic Forum, WEF Badge

Production Systems

Context Engineering as a Core Discipline

  • From Prompt Craft to System Design: Context Engineering as a Core Discipline for AI-Driven Delivery, Forte Group Team, Forte Badge
  • Context Engineering: A Framework for Enterprise AI Operations, Shelly Palmer, ShellyPalmer Badge
  • How MCP Handles Context Management in High-Throughput Scenarios, Portkey.ai Team, Portkey Badge

Enterprise AI Case Studies

  • Case Study: JPMorgan's COiN Platform โ€“ Agentic AI for Financial Analysis, AI Mindset Research, Banking Badge
  • Case Study: EY's Agentic AI Integration in Microsoft 365 Copilot, AI Mindset Research, Professional Services Badge
  • Context Is Everything: The Massive Shift Making AI Actually Work in the Real World, Phil Mora, Cross Industry Badge

Enterprise Applications and Infrastructure

  • The Context Layer for Enterprise RAG Applications, Contextual AI Team, Contextual AI Badge
  • Navigating AI Model Deployment: Challenges and Solutions, Dean Lancaster, LinkedIn Badge
  • 2024: The State of Generative AI in the Enterprise, Menlo Ventures, Report Badge
  • How 100 Enterprise CIOs Are Building and Buying Gen AI in 2025, Andreessen Horowitz, a16z Badge

๐Ÿ”ฎ Limitations and Future Directions

Current Limitations

  1. Context Window Constraints: Despite improvements, context length remains a bottleneck
  2. Computational Overhead: Processing large contexts requires significant resources
  3. Context Coherence: Maintaining coherence across extended contexts
  4. Dynamic Adaptation: Real-time context updating challenges

Future Research Directions

  1. Infinite Context: Developing truly unlimited context capabilities
  2. Context Compression: Efficient representation of large contexts
  3. Multimodal Integration: Seamless integration of diverse data types
  4. Adaptive Context: Self-optimizing context management
  5. Context Privacy: Securing sensitive information in context pipelines

๐Ÿค Contributing

We welcome contributions to this survey! Please follow these guidelines:

  1. Fork the repository
  2. Create a feature branch
  3. Add relevant papers with proper formatting
  4. Submit a pull request with a clear description

Paper Formatting Guidelines

<li><i><b>Paper Title</b></i>, Author et al., <a href="URL" target="_blank"><img src="https://img.shields.io/badge/SOURCE-YEAR.MM-COLOR" alt="SOURCE Badge"></a></li>

Badge Colors

  • arXiv Badge red for arXiv papers
  • PDF Badge blue for conference/journal papers
  • GitHub Badge white for GitHub repositories
  • HuggingFace Badge yellow for HuggingFace resources

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


๐Ÿ“ง Contact

For questions, suggestions, or collaboration opportunities, please feel free to reach out:

Lingrui Mei
๐Ÿ“ง Email: meilingrui22@mails.ucas.ac.cn

You can also open an issue in this repository for general discussions and suggestions.


๐Ÿ™ Acknowledgments

This survey builds upon the foundational work of the AI research community. We thank all researchers contributing to the advancement of context engineering and large language models.


Star โญ this repository if you find it helpful!

About

๐Ÿ”ฅ Comprehensive survey on Context Engineering: from prompt engineering to production-grade AI systems. hundreds of papers, frameworks, and implementation guides for LLMs and AI agents.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published