Skip to content

WeiminXiong/MPO

Repository files navigation

rho-logo
MPO: Boosting LLM Agents with Meta Plan Optimization

[📜 Arxiv][🤗 Dataset][🤗 Models][🐱 GitHub]

This repository contains the code for the paper "MPO: Boosting LLM Agents with Meta Plan Optimization"

In this work, we introduce the Meta Plan Optimization (MPO) framework, designed to enhance agent planning capabilities by directly integrating explicit guidance. Unlike previous methods that depend on complex knowledge—often requiring extensive human effort or lacking quality assurance—MPO leverages high-level general guidance through meta plans. This approach not only assists agents in planning but also enables continuous optimization of meta plans based on feedback from the agent's task execution.

🔥 News

  • [2025/08/20] 🔥🔥🔥 Our work has been accepted to EMNLP 2025 Findings!
  • [2025/03/05] 🔥🔥🔥 MPO-optimized meta planner released at 🤗 HuggingFace!
    • Llama-3.1-70B-Instruct, enhanced with the MPO-optimized meta planner (ALFWorld-MPO and SciWorld-MPO), achieved an average accuracy of 83.1 on ALFWorld and SciWorld, setting a new state-of-the-art (SOTA) performance.
    • Llama-3.1-8B-Instruct + MPO achieved an average performance of 53.6, outperforming GPT-4o-mini by a significant margin with a 30.1% improvement.
  • [2025/03/05] 🔥🔥🔥 The dataset for MPO released at 🤗 HuggingFace!
  • [2025/03/04] MPO paper and repo released.

🛠️ Setup

git clone https://github.com/WeiminXiong/MPO.git
cd MPO
conda create -n mpo python=3.10
conda activate mpo
pip install -r requirements.txt
bash download_data.sh

🚀 Quick Start

To evaluate the effectiveness of MPO-optimized meta plans on baseline models, directly run the following bash script:

bash run_experiment.sh

The script performs the following steps:

  1. configure the experiment parameters in run_experiment.sh
  2. launch the model server
  3. run the experiment

🎮 Dataset Construction

To generate training data for the DPO optimization phase of the meta planner, run the following bash script.

bash scripts/mc_sample.sh

The script performs the following steps:

  1. configure the experiment parameters in scripts/mc_sample.sh
  2. sample metaplans from the SFT-initialized metaplan generator
  3. let the explorer agent to evaluate the quality of the sampled metaplans
  4. generate training data for the DPO optimization phase of the meta planner

For more details about the dataset construction, please refer to the scripts directory.

🧩 Structure of This Project

There are eight main folders in this project: agents, configs, data, envs, prompt, scripts, tasks, utils.

agents: code for the agents

configs: configuration files for the experiments

data: data for the experiments

envs: code for the environments

prompt: prompt templates

scripts: script for dataset construction and meta plan generation

tasks: code for the tasks

utils: utility functions

📖 Citation

If you find this repo helpful, please cite out paper:

@misc{xiong2025mpoboostingllmagents,
      title={MPO: Boosting LLM Agents with Meta Plan Optimization}, 
      author={Weimin Xiong and Yifan Song and Qingxiu Dong and Bingchan Zhao and Feifan Song and Xun Wang and Sujian Li},
      year={2025},
      eprint={2503.02682},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2503.02682}, 
}

About

MPO: Boosting LLM Agents with Meta Plan Optimization (EMNLP 2025 Findings)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published