Skip to content

AMD-AGI/AMD-Spark

Repository files navigation

SparK: Query-Aware Unstructured Sparsity with Recoverable KV Cache Channel Pruning

Huanxuan Liao, Yixing Xu, Shizhu He, Guanchen Li, Xuanwu Yin, Dong Li, Emad Barsoum, Jun Zhao, Kang Liu | Paper

Installation

poetry install --with dev

Usage

press_names=("snapkv" "pyramidkv" "streaming_llm" "tova" "observed_attention" "expected_attention" "pyramid_spark" "snap_spark" "pyramid_think" "snap_think")


model=${1:-"llama3.1-8b-inst"} # model name
compress_questions=${2:-"0"} # compress questions default 1
key_channel_compression_ratio=${3:-"0.5"}
press=${4:-"snapkv"} # compress methods
gpus=${5:-"0"} # gpus
temperature=${6:-"0.0"} 
threshold_ratio=${7:-"0.0"}
pooling_ratio=${8:-"0.0"}

# threshold_ratio choices: 0.0 0.99 0.992 0.996 0.998 0.997...   control dynamic group and topp
# pooling_ratio choices: 0.0 0.65 0.655 0.75...   control recover method  6* is exp and 7* is norm

bash run2.sh # spark recover with avg
bash run4.sh # all baselines with think and spark

The specific parameters can be found in the method's implementation in spark_press.

For more methods (press), see PRESS_DICT in eval.py.

Citation

If you find SparK or this project is helpful, please kindly consider cite our paper 😊.

@article{liao2025spark,
  title={SparK: Query-Aware Unstructured Sparsity with Recoverable KV Cache Channel Pruning},
  author={Liao, Huanxuan and Xu, Yixing and He, Shizhu and Li, Guanchen and Yin, Xuanwu and Li, Dong and Barsoum, Emad and Zhao, Jun and Liu, Kang},
  journal={arXiv preprint arXiv:2508.15212},
  year={2025}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published