GitHub - AMD-AGI/AMD-Spark

SparK: Query-Aware Unstructured Sparsity with Recoverable KV Cache Channel Pruning

Huanxuan Liao, Yixing Xu, Shizhu He, Guanchen Li, Xuanwu Yin, Dong Li, Emad Barsoum, Jun Zhao, Kang Liu | Paper

Installation

poetry install --with dev

Usage

press_names=("snapkv" "pyramidkv" "streaming_llm" "tova" "observed_attention" "expected_attention" "pyramid_spark" "snap_spark" "pyramid_think" "snap_think")


model=${1:-"llama3.1-8b-inst"} # model name
compress_questions=${2:-"0"} # compress questions default 1
key_channel_compression_ratio=${3:-"0.5"}
press=${4:-"snapkv"} # compress methods
gpus=${5:-"0"} # gpus
temperature=${6:-"0.0"} 
threshold_ratio=${7:-"0.0"}
pooling_ratio=${8:-"0.0"}

# threshold_ratio choices: 0.0 0.99 0.992 0.996 0.998 0.997...   control dynamic group and topp
# pooling_ratio choices: 0.0 0.65 0.655 0.75...   control recover method  6* is exp and 7* is norm

bash run2.sh # spark recover with avg
bash run4.sh # all baselines with think and spark

The specific parameters can be found in the method's implementation in spark_press.

For more methods (press), see PRESS_DICT in eval.py.

Citation

If you find SparK or this project is helpful, please kindly consider cite our paper 😊.

@article{liao2025spark,
  title={SparK: Query-Aware Unstructured Sparsity with Recoverable KV Cache Channel Pruning},
  author={Liao, Huanxuan and Xu, Yixing and He, Shizhu and Li, Guanchen and Yin, Xuanwu and Li, Dong and Barsoum, Emad and Zhao, Jun and Liu, Kang},
  journal={arXiv preprint arXiv:2508.15212},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
infinite_bench		infinite_bench
kvpress		kvpress
longbench		longbench
loogle		loogle
ruler		ruler
zero_scrolls		zero_scrolls
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
eval.py		eval.py
evaluate.sh		evaluate.sh
evaluate_spark.sh		evaluate_spark.sh
metric.py		metric.py
metric.sh		metric.sh
metric_longbench.py		metric_longbench.py
metric_ruler.py		metric_ruler.py
patch.py		patch.py
run2.sh		run2.sh
run4.sh		run4.sh
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SparK: Query-Aware Unstructured Sparsity with Recoverable KV Cache Channel Pruning

Installation

Usage

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

AMD-AGI/AMD-Spark

Folders and files

Latest commit

History

Repository files navigation

SparK: Query-Aware Unstructured Sparsity with Recoverable KV Cache Channel Pruning

Installation

Usage

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages