ICML 2025 Poster · Speculate, then Collaborate: Fusing Knowledge of Language Models during Decoding
CoSD is a plug-in algorithm for speculative decoding that fuses knowledge from a draft model and an assistant model during decoding. It is model-agnostic and can be integrated into existing speculative decoding implementations with minimal changes.
This repository builds on MCSD and evaluates with tinyBenchmarks.
Base code: https://github.com/NJUNLP/MCSD
Evaluation: https://github.com/felipemaiapolo/tinyBenchmarks
- Drop-in plugin for any speculative decoding pipeline
- Knowledge fusion that uses draft and target signals
- Reproducible evaluation with tinyBenchmarks
# Python ≥ 3.9 with CUDA is recommended
git clone <this-repo-url>
cd <this-repo-dir>
# (Optional) create environment
# conda create -n cosd python=3.10 -y
# conda activate cosd
# Install common dependencies
pip install torch transformers accelerate datasets ...
We follow tinyBenchmarks. You can:
- Clone tinyBenchmarks and point
--datapath
to its prepared data files, or - Provide your own JSON/JSONL in the format expected by
evaluation.py
.
git clone https://github.com/felipemaiapolo/tinyBenchmarks
# Prepare paths and pass to --datapath (see below)
Minimal command to run CoSD on tinyBenchmarks:
python evaluation.py \
--draft-model PATH_TO_DRAFT_MODEL \
--target-model PATH_TO_TARGET_MODEL \
--fp16 \
--k-config 4,2,2 \
--datapath PATH_TO_DATA # you can use an empty file since the evaluation is done by tinyBenchmarks\
--sampling-type sampling
python evaluation.py \
--draft-model mistralai/Mistral-7B-v0.1 \
--target-model meta-math/MetaMath-Mistral-7B \
--fp16 \
--k-config 4,2,2 \
--datapath ./data/empty.jsonl \
--sampling-type sampling
--draft-model
(str): Draft model path or Hugging Face id--target-model
(str): Assistant model path or Hugging Face id--fp16
(flag): Enable FP16 inference--k-config
(str): Comma-separated speculation schedule, e.g.,4,2,2
, special arguments in MCSD--datapath
(str): Evaluation data path (can be empty in our code and will not be used)--sampling-type
(str): Decoding mode, e.g.,sampling
orgreedy
Tip: run python evaluation.py -h
for full options.
CoSD is a lightweight plugin:
- Initialize draft and target models as usual
- Train a decision tree with a few data samples if using CoSD-Tree
- Replace the speculative accept/reject step with CoSD’s fusion step
- Call
generate(...)
as usual; log both quality and speed statistics
See cosd/
or the CoSD class in this repository for a minimal integration example.
- MCSD: https://github.com/NJUNLP/MCSD
- tinyBenchmarks: https://github.com/felipemaiapolo/tinyBenchmarks