We introduce an adaptive multi-agent framework designed to enhance collaborative reasoning through both model-level training and system-level coordination.
📄 Paper
🌐 Project Page
🧠 Model: M1-32B
📂 Dataset: M500
- Overview
- Configure Environment
- Install Requirements
- Configure Model and Dataset
- Train and Inference
- Contributor
- Citation
Multi-agent systems (MAS) built on large language models (LLMs) offer a promising path toward solving complex, real-world tasks that single-agent systems often struggle to manage. While recent advancements in test-time scaling (TTS) have significantly improved single-agent performance on challenging reasoning tasks, how to effectively scale collaboration and reasoning in MAS remains an open question. In this work, we introduce an adaptive multi-agent framework designed to enhance collaborative reasoning through both model-level training and system-level coordination. We construct M500, a high-quality dataset containing 500 multi-agent collaborative reasoning traces, and fine-tune Qwen2.5-32B-Instruct on this dataset to produce M1-32B, a model optimized for multi-agent collaboration. To further enable adaptive reasoning, we propose a novel CEO agent that dynamically manages the discussion process, guiding agent collaboration and adjusting reasoning depth for more effective problem-solving. Evaluated in an open-source MAS across a range of tasks—including general understanding, mathematical reasoning, and coding—our system significantly outperforms strong baselines. For instance, M1-32B achieves 12% improvement on GPQA-Diamond, 41% on AIME2024, and 10% on MBPP-Sanitized, matching the performance of state-of-the-art models like DeepSeek-R1 on some tasks. These results highlight the importance of both learned collaboration and adaptive coordination in scaling multi-agent reasoning.
Authors: Can Jin, Hongwu Peng, Qixin Zhang, Yujin Tang, Dimitris N. Metaxas, Tong Che
Note that the results of s1.1-32B are obtained without using budget forcing.
Adding environment variables to ~/.bashrc:
export MY_HOME=
export MY_PROJECT=MAS-TTS
export MY_OUTPUT=
export OPENAI_API_KEY=
export TOGETHER_API_KEY=
export TOGETHER_BASE_URL=
export DEEPSEEK_BASE_URL=
export DEEPSEEK_API_KEY=
export WANDB_USER_NAME=
export WANDB_API_KEY=
export HUGGING_FACE_HUB_TOKEN=
export HUGGING_FACE_USER_NAME=
source ~/.bashrc
Installing requirements for the MAS (Agentverse):
# Create and activate conda environment
conda create -n agent python=3.11
conda activate agent
cd $MY_PROJECT
pip install -e .
python -m spacy download en_core_web_sm
Installing requirements for Model Training (LLaMA-Factory):
conda create -n llamafactory python=3.11
conda activate llamafactory
cd $MY_PROJECT/LLaMA-Factory
pip install -e ".[torch,metrics]"
pip install vllm deepspeed flash-attn wandb
Download the M1-32B model and M500 dataset: M1-32B is at https://huggingface.co/Can111/m1-32b M500 is at https://huggingface.co/datasets/Can111/m500
If you want to configure new models and datasets, please refer to the following instructions:
Config new models: gpt-4o-mini, o3-mini, deepseek-chat, deepseek-reasoner, Qwen, etc.
# local models
add config in `agentverse/llms/__init__.py`
# remote models
add config in `agentverse/llms/openai.py`
Config new datasets: Add new datasets or tasks: AIME 2024, MATH-500, GPQA-Diamond, etc.
# data
add data in the `data` folder
# tasks
register tasks in `dataloader/__init__.py`
# configs
add configs in `agentverse/tasks/tasksolving`
For more detailed instructions, please refer to the Agentverse repository.
Train the model on M500 dataset:
bash run/train.sh
Inference the model using MAS on a task:
bash run/inference.sh
The MAS framework is built on the Agentverse repository.
Consider citing our paper if you find our work useful.
@article{jin2025two,
title={Two Heads are Better Than One: Test-time Scaling of Multi-agent Collaborative Reasoning},
author={Jin, Can and Peng, Hongwu and Zhang, Qixin and Tang, Yujin and Metaxas, Dimitris N and Che, Tong},
journal={arXiv preprint arXiv:2504.09772},
year={2025}
}