MAS-Zero: Designing Multi-Agent Systems with Zero Supervision

🎉 News • 🔗 Links • 📝 Conceptual Overview • ⚙️ Algorithm Flow • 📊 Results

✨ Getting Started • 🏋️ Design MAS with Zero Supervision •

🎈 Citation • 🌻 Acknowledgement • 📧 Contact

[2025/05/06] We present the MAS-Zero [Project Page | Paper | Code]

🔗 Links

🏠 [Project Page]
📜 [Paper]
💻 [Code]

⚙️ Algorithm Flow

We propose MAS-Zero, a meta-agent that serves several roles (design, evaluate, and verify) and involves two steps:

Meta-Iterations:
1. MAS-Design: Task Decomposition and propose sub-MAS for each sub-task. We frame the MAS design as code generation.
2. MAS-Feedback: Evaluate the generated MAS design on solvability and completeness. We evalaute these metrics using intermediate outputs by executing the MAS code.
Self-Verification: selects the most suitable outcome from the set of all candidate solutions generated throughout the meta-iteration process.

In the whole process: no validation set needed; Meta-level self-supervision on MAS design; Inference-time only.

📊 Results

Performance vs. Cost

MAS-Zero sets a new frontier in the performance-cost trade-off across diverse domains and LLMs.

All Results

Our approach achieves strong performance across mathematical reasoning, graduate-level QA, and code benchmarks, using GPT-4o, LLaMA3.3-70B, and Qwen2.5-32B, without relying on any external supervision.

✨ Getting Started

🎄 Environment Setup

conda create -n mas_zero python=3.12 && conda activate mas_zero
pip install anthropic
pip install openai
pip install backoff
pip install together
cd ./
pip install -r requirements.txt
pip install datasets
pip install jinja2
pip install -e human-eval

🏋️ Design MAS with Zero Supervision

⚠️WARNING⚠️: The implementation in this repository is very raw and intended for research purposes only. It is not secure for production environments. We plan to update our code to more secure implementations in the future. Your use of our code is at your own discretion and risk.

♟️ Search

You can change AIME (aime24) to GPQA (gpqa_diamond) or SWE-Bench (swe_bench). For SWE-Bench, you need to follow the SWE-Bench instructions to install the Docker environment first. You can also modify meta_model and node_model to other LLMs. Please refer to the sampler/ folder (we support GPT, Claude, VLLM, and TogetherAI).

export OPENAI_API_KEY={YourKey}
export TOGETHER_API_KEY={YourKey}

python main_question.py  --dataset workflow_search/aime24 --option plan --meta_model gpt-4o_chatgpt --node_model gpt-4o_chatgpt --verifier_model gpt-4o_chatgpt --blocks COT COT_SC Reflexion LLM_debate --use_oracle_verifier --defer_verifier --n_generation 5

🔍 Verification

Similarly, you can change AIME (aime24) to GPQA (gpqa_diamond) or SWE-Bench (swe_bench). You can also modify model to other LLMs. Please refer to the sampler/ folder (we support GPT, Claude, VLLM, and TogetherAI).

export OPENAI_API_KEY={YourKey}
export TOGETHER_API_KEY={YourKey}

python main_judge.py  --dataset aime24 --judge_method self --baseline workflow_search --model gpt-4o_chatgpt --min_sample 0 --max_sample 30 --max_response_per_sample 5

🎈 Citation

If you find MAS-Zero Reasoner helpful, please cite us.

@misc{ke2025maszero,
      title={MAS-Zero: Designing Multi-Agent Systems with Zero Supervision}, 
      author={Zixuan Ke and Austin Xu and Yifei Ming and Xuan-Phi Nguyen and Caiming Xiong and Shafiq Joty},
      year={2025},
      eprint={2505.14996},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2505.14996}, 
}

🌻 Acknowledgement

This project received help from many researchers at Salesforce AI Research. The code is adapted from the ADAS. During development, we also referred to simple-evals, MaAS, and AFlow.

Many thanks to the authors of these projects for their excellent contributions!

📧 Contact

Feel free to contact Zixuan Ke via email: zixuan.ke@salesforce.com

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
blocks		blocks
dataset		dataset
figures		figures
llm_judge		llm_judge
prompts		prompts
sampler		sampler
AI_ETHICS.md		AI_ETHICS.md
CODEOWNERS		CODEOWNERS
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.txt		LICENSE.txt
README.md		README.md
SECURITY.md		SECURITY.md
code_archive.py		code_archive.py
common.py		common.py
license_info.md		license_info.md
main_judge.py		main_judge.py
main_question.py		main_question.py
requirements.txt		requirements.txt
search.py		search.py
shared_vars.py		shared_vars.py
swe_utils.py		swe_utils.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MAS-Zero: Designing Multi-Agent Systems with Zero Supervision

🔗 Links

⚙️ Algorithm Flow

📊 Results

Performance vs. Cost

All Results

✨ Getting Started

🎄 Environment Setup

🏋️ Design MAS with Zero Supervision

♟️ Search

🔍 Verification

🎈 Citation

🌻 Acknowledgement

📧 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Languages

License

SalesforceAIResearch/MAS-Zero

Folders and files

Latest commit

History

Repository files navigation

MAS-Zero: Designing Multi-Agent Systems with Zero Supervision

🔗 Links

⚙️ Algorithm Flow

📊 Results

Performance vs. Cost

All Results

✨ Getting Started

🎄 Environment Setup

🏋️ Design MAS with Zero Supervision

♟️ Search

🔍 Verification

🎈 Citation

🌻 Acknowledgement

📧 Contact

About

Topics

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Languages

Packages