Skip to content

chang-github-00/LLM-Predictive-Decoding

Repository files navigation

Non-Myopic Generation of Language Models for Reasoning and Planning (ICLR 2025)

Non-Myopic Generation of Language Models for Reasoning and Planning📝

Figure: Non-Myopic Generation Overview

Callback on Motivation

As strong reasoning and planning abilities emerge in Large Language Models (LLMs), LLMs become the de facto solver for complex problems. LLMs are unique in their step-by-step problem solving ability, the ability to perform sequential planning through autoregressive generation. However, errors tend to exist during planning as autoregressive generation are often imperfect, and it is fundamentally challenging to generate global-aware solutions at earlier steps.

As shown in the figure below, there are two solutions fixing this. The first is to fix the errors by correcting the mistakes, i.e. self-reflection. Most of the past work has been quite focused on demonstrating and improving the effectiveness of this paradigm. However, self-reflection abilities of many LLMs are limited and training generalizable reflection models are hard. We on the other hand, want to place emphasis on the second paradigm, avoiding the mistakes from happening. We show in our series of work Predictive-Decoding, Phi-Decoding, Genius how this could be implemented in a very simple decoding method that works for math, coding and agent tasks.

Figure 1 Figure 2

Predictive Decoding simulates an Energy-Based model such that it performs autoregressive generation, but aims to optimize global planning. For each step, the LLM generates T steps ahead, and samples the action for current step t based on LLM self-evaluation at step t+T. We find without using any additional reward model, directly optimizing LLM future logprob yields impressive improvement.

See our latest follow-up work:

Also see phi-decoding repo for implementation of predictive decoding on AIME and latest models (Qwen, Deepseek).

Quick Start

Prepare environment

The agent environment is mainly built upon AgentBoard, which supports agent experiments AlfWorld and PDDL. Please refer here for a detailed setup guide. For math tasks and vllm dependency, install pip install -r requirements.txt. There might be version conflict between the agent environment and vllm dependencies. It is recommended to install the two environments independently, and launch vllm as service.

Dataset

download agent tasks dataset.

cd dataset
wget https://huggingface.co/datasets/hkust-nlp/agentboard/resolve/main/data.tar.gz
tar -zxvf data.tar.gz

download gsm8k from huggingface.

cd dataset
huggingface-cli download openai/gsm8k --repo-type dataset --local-dir gsm8k

Run Predictive Decoding

Requires one 80G GPU to run 8B LLM.

CUDA_VISIBLE_DEVICES=0 python agentboard/eval_reasoning_parallel.py
  --cfg-path eval_configs/gsm8k/mpc_sample_gsm8k_llama3.yaml
  --tasks gsm8k --algorithm MPC_Sample
  --model llama-3
  --data_path $PROJECT_DIR/data/gsm8k
  --batch_size 500

More scripts on running Predictive Decoding on other tasks, and scripts for baselines on available here.

Run Reward-guided Predictive Decoding

Requires two 80G GPU to run 8B LLM and 7B reward model.

First launch Math-Shepherd reward model:

vllm serve peiyi9979/math-shepherd-mistral-7b-prm

Set OPENAI_API_BASE = "http://localhost:8000/v1" in .env file.

Run reward-guided predictive decoding:

python agentboard/eval_reasoning_reward_parallel.py
  --cfg-path eval_configs/gsm8k/mpc_reward_gsm8k_llama3.yaml
  --tasks gsm8k --algorithm MPC_Sample_Reward
  --model llama-3 --data_path $PROJECT_DIR/data/gsm8k
  --batch_size 2000
  --reward_model math-shepherd

Change Parameters for Analyzing Test-time Scaling Law

See commands in scripts/run_scaling_law.sh.

Citation

If you find this repository useful, please consider giving star and citing our paper:

@article{ma2024non,
  title={Non-myopic Generation of Language Models for Reasoning and Planning},
  author={Ma, Chang and Zhao, Haiteng and Zhang, Junlei and He, Junxian and Kong, Lingpeng},
  journal={arXiv preprint arXiv:2410.17195},
  year={2024}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •