Accelerating Large Language Model Reasoning via Speculative Search

Table of Contents 📖

Getting Started
Usage
Acknowledgements
Citation

Getting Started

Installation

conda create -n open_reasoner python=3.10
conda activate open_reasoner
pip install -r requirements.txt
pip3 install  "fschat[model_worker,webui]"
pip install -U pydantic
cd envs/MATH/latex2sympy
pip install -e .
cd -

Download Base Models

Before running the project, please ensure that all required base models are downloaded. The models used in this project include:

Qwen2.5-72B-Instruct-GPTQ-Int4, Qwen2.5-7B-Instruct-GPTQ-Int4
Meta-Llama-3-70B-Instruct-GPTQ-Int4, Llama-3-8B-Instruct-GPTQ-4-Bit
peiyi9979/math-shepherd-mistral-7b-prm
openreasoner/Math-psa

To download these models, please refer to the Hugging Face model downloading tutorial for step-by-step guidance on downloading models from the Hugging Face Hub.

Please make sure that all models are saved in their directories according to the project setup before proceeding.

Quickstart

Before running inference, please modify the following variables in the scripts under reason/llm_service/ to set the appropriate base models for your usage:

$MODEL_BASE: Set this to the directory where your models are stored.
$POLICY_MODEL_NAME: Set this to the name of the large model you wish to use.
$VALUE_MODEL_NAME: Set this to the name of the value model you wish to use.
APPROXIMATION_MODEL_NAME: Set this to the name of the approximation model you wish to use.
$NUM_LM_WORKER: Set this to the number of language model (LM) workers to start.
$NUM_RM_WORKER: Set this to the number of reward model (RM) workers to start.

Then it prepares and runs inference using different techniques.

Start LM & RM Services

For example, to start the LM, sLM and RM services for the Math Shepherd model, run the following command:

sh reason/llm_service/create_service_math_shepherd.sh

To kill the server processes, recommend using the following command:

tmux kill-session -t {Your Session Name} # default is `FastChat`

Usage

The main files are organized as follows:

./reason/evaluation/evaluate.py: Main code for overall inference pipeline.
./reason/evaluation/methods.py: Implementation of different search methods.
./envs/base_env.py: Implementation details of SpecSearch.

All hyperparameters used for inference are set in the bash file ./scripts/eval/xxx.sh, e.g.,

c : Initial threshold.
baseline_mode : 1-5 for baselines, 6 for initial method using a fixed threshold, 7 for our SpecSearch, 8-10 for ablation study.
model_mode : 1 for Qwen models, 2 for Llama models.
N : Number of thoughts per step.
M : Beam size.
is_tensorboard : 1 for tensorboard, 0 for no tensorboard.
full_reward : 1 for full reward, 0 for partial reward.
x : Value of theta.

Run Inference

⚠️ Make sure the input (--LM, --sLM, --RM) in the script aligns with the variables ($POLICY_MODEL_NAME, $APPROXIMATION_MODEL_NAME, $VALUE_MODEL_NAME) in the pending worker!

export PYTHONPATH=$(pwd)

sh scripts/eval/beam_search_qwen.sh

sh scripts/eval/beam_search_llama.sh

sh scripts/eval/vanila_mcts_qwen.sh

sh scripts/eval/vanila_mcts_llama.sh

Acknowledgements

This project builds upon the contributions of several open-source efforts:

OpenR: for providing core functionalities for evaluation and baseline methods.
LLaMA and Qwen: for the pretrained language models used in the inference process.

We thank the developers and maintainers of these projects for their excellent work.

Citation

If you find this project helpful for your research, please consider citing our work:

@misc{wang2025acceleratinglargelanguagemodel,
      title={Accelerating Large Language Model Reasoning via Speculative Search}, 
      author={Zhihai Wang and Jie Wang and Jilai Pan and Xilin Xia and Huiling Zhen and Mingxuan Yuan and Jianye Hao and Feng Wu},
      year={2025},
      eprint={2505.02865},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2505.02865}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
config		config
data		data
distributed		distributed
envs		envs
figure		figure
gen_rm		gen_rm
prm		prm
reason		reason
scripts/eval		scripts/eval
.gitignore		.gitignore
README.md		README.md
README_zh.md		README_zh.md
log.py		log.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Accelerating Large Language Model Reasoning via Speculative Search

Getting Started

Installation

Download Base Models

Quickstart

Start LM & RM Services

Usage

Run Inference

Acknowledgements

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

MIRALab-USTC/LLMReasoning-SpecSearch

Folders and files

Latest commit

History

Repository files navigation

Accelerating Large Language Model Reasoning via Speculative Search

Getting Started

Installation

Download Base Models

Quickstart

Start LM & RM Services

Usage

Run Inference

Acknowledgements

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages