Skip to content

iiis-ai/meta-prompting

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Meta Prompting for AI Systems

arXiv License: MIT Python 3.9+

This is the official repository for the paper "Meta Prompting for AI Systems" (https://arxiv.org/abs/2311.11482).

Abstract: We introduce Meta Prompting (MP), a framework that elevates the reasoning capabilities of large language models (LLMs) by focusing on the formal structure of a task rather than content-specific examples. We establish a theoretical foundation for this paradigm, formalizing MP as a functor that maps a category of tasks to a category of structured prompts, thereby guaranteeing that compositional problem-solving strategies can be systematically decomposed into modular prompt structures. We extend this concept to Recursive Meta Prompting (RMP), an automated process where an LLM can generate and refine its own prompts. We model this self-improvement loop formally as a monad, providing a principled framework for automated prompt engineering. Our claims are validated through extensive experiments demonstrating that a Qwen-72B base model, guided by a single, example-agnostic meta-prompt, achieves state-of-the-art results on MATH, GSM8K, and Game of 24. These results are achieved with substantial token efficiency gains over traditional few-shot methods.


Table of Contents


What is Meta Prompting?

Meta Prompting is a technique that provides a high-level, structural template for how to think rather than specific examples of what to think. Unlike few-shot prompting, which relies on content-rich examples, a meta-prompt is a single, example-agnostic scaffold that guides the LLM's reasoning process. It focuses on the formal procedure, syntax, and compositionality of problem-solving.

This approach teaches the model a reusable, structured method for tackling an entire category of tasks, leading to greater efficiency and robustness.

Visual Comparison

Here’s the difference in action. A Meta Prompt provides a general structure for solving any problem in a category, while a Few-Shot Prompt provides specific, solved examples.

Meta Prompt: A Structural Template

This prompt gives the LLM a reusable blueprint for solving math problems. It's completely agnostic to the actual numbers or problem details.

Integrate step-by-step reasoning to solve mathematical problems under the following structure:

{
  "Problem": "[question to be answered]",
  "Solution": {
    "Step 1": "Begin the response with 'Let's think step by step.'",
    "Step 2": "Follow with the reasoning steps, ensuring the solution process is broken down clearly and logically.",
    "Step 3": "End the solution with the final answer encapsulated in a LaTeX-formatted box, $\boxed{...}$"
  },
  "Final Answer": "[final answer]"
}

Few-Shot Prompt: Concrete Examples

For comparison, a traditional few-shot prompt provides several concrete (problem, solution) pairs. The model is expected to learn the pattern from these specific examples.

Problem: Find the domain of the expression $\frac{\sqrt{x-2}}{\sqrt{5-x}}$.

Solution: The expressions inside each square root must be non-negative. Therefore, $x-2 \ge 0$, so $x\ge2$, and $5-x>0$, so $x<5$. Therefore, the domain is $\boxed{[2,5)}$.
Final Answer: The final answer is $[2,5)$.

---------

Problem: If $\det \mathbf{A} = 2$ and $\det \mathbf{B} = 12,$ then find $\det (\mathbf{A} \mathbf{B}).$

Solution: We have that $\det (\mathbf{A} \mathbf{B}) = (\det \mathbf{A})(\det \mathbf{B}) = (2)(12) = \boxed{24}.$
Final Answer: The final answer is $24$.

---------
...

Key Contributions

  • Introduces Meta Prompting (MP): A new, example-agnostic prompting paradigm that focuses on the formal structure of reasoning.
  • Establishes a Theoretical Foundation: We formalize MP using category theory, modeling it as a functor that guarantees compositional problem-solving.
  • Defines Recursive Meta Prompting (RMP): We introduce a process for automated prompt engineering and self-improvement, which we formalize as a monad.
  • Achieves State-of-the-Art Performance: Using a single, zero-shot meta-prompt, our method sets new SOTA results on benchmarks like MATH (46.3%) and GSM8K (83.5%) with a base Qwen-72B model, outperforming fine-tuned models and even GPT-4.

A Formal Framework

A core novelty of our work is providing a rigorous mathematical foundation for prompt engineering.

Meta Prompting as a Functor

We model Meta Prompting as a functor $\mathcal{M}: \mathcal{T} \to \mathcal{P}$ that maps a category of tasks ($\mathcal{T}$) to a category of prompts ($\mathcal{P}$). This ensures that if a complex task can be broken down into simpler sub-tasks, the corresponding prompt can also be constructed by composing simpler prompts. This provides a theoretical guarantee of modularity and compositionality.

Recursive Meta Prompting as a Monad

We model the process of an LLM refining its own prompts (RMP) as a monad. This framework provides a principled way to manage self-improvement, ensuring the process is stable and consistent. The workflow can be visualized as a Meta-Meta-Prompt guiding a "Proposer" LLM to generate a task-specific meta-prompt, which is then used by an "Executor" LLM.


Experimental Results

Our experiments show that Meta Prompting unlocks powerful reasoning in base language models without any fine-tuning, using only a single, example-agnostic prompt.

MATH Benchmark

With a single meta-prompt, Qwen-72B (base) surpasses all other open-source models and even the initial release of GPT-4.

Model FT-Dataset Tool Usage Eval Method MATH (%)
Proprietary Models
Claude-2 - No CoT 32.5
Minerva-540B Arxiv+Web No CoT 33.6
PaLM-2 - No CoT 34.3
GPT-4 (2023-0314) - No CoT 42.5
Open-source Models
Llama-2-70B (base) - No CoT 13.5
Qwen-14B (base) - No MP 28.9
Qwen-72B-MetaMathQA MetaMathQA No CoT 41.7
Qwen-72B (base) - No MP 46.3

GSM8K Benchmark

Our zero-shot meta-prompting approach again shows substantial improvements over both base models with CoT and powerful fine-tuned models.

Model FT-Dataset Tool Usage Eval Method GSM8K (%)
Llama-2-70B (base) - No CoT 56.8
Qwen-14B (base) - No MP 64.8
WizardMath-70B WizardMath No CoT 81.6
MetaMath-70B MetaMathQA No CoT 82.3
Qwen-72B (base) - No MP 83.5

Game of 24

Meta Prompting achieves a 100% success rate by generating a single Python program to solve all puzzles, demonstrating extreme efficiency compared to iterative methods like Tree-of-Thought (ToT).

Method LLM Sessions (per sample) Generate/Prompt tokens Cost (per sample) Success Rate
IO (best of 100) 100 1.8k / 1.0k $0.13 33%
CoT (best of 100) 100 6.7k / 2.2k $0.47 49%
ToT (breadth=5) 61.72 5.5k / 1.4k $0.74 74%
MP 1/N ≈ 1/N (8k / 1k) ≈ $0.0003 100%

Getting Started: Try Meta Prompting

Using our Prompts

The prompts used in our paper are available in the /prompts directory. You can use them as system prompts for LLMs to replicate our results or adapt them for your own tasks.

Example: MATH Meta-Prompt

Here is the meta-prompt used for the MATH benchmark, which you can use directly.

**Problem Statement**:
- **Problem**: [question to be answered]

**Solution Structure**:

1. Begin the response with "Let's think step by step."
2. Follow with the reasoning steps, ensuring the solution process is broken down clearly and logically.
3. End the solution with the final answer encapsulated in a LaTeX-formatted box, $\boxed{...}$, for clarity and emphasis.
4. Finally, state "The answer is [final answer to the problem].", with the final answer presented in LaTeX notation.
---------

Live Demos (Custom GPTs)

You can try out various versions of Meta Prompting agents via these custom GPTs:


Citation

If you find our work useful for your research, please consider citing our paper and starring this repository. Thank you!

@article{zhang2023meta,
  title={Meta Prompting for AI Systems},
  author={Zhang, Yifan and Yuan, Yang and Yao, Andrew Chi-Chih},
  journal={arXiv preprint arXiv:2311.11482},
  year={2023}
}

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.1%
  • Shell 0.9%