GitHub - RealJosephus/dvattn: Dynamic View Attention

Dynamic View Attention

This repository provides the implementation of DVAttn, a custom attention mechanism designed for efficient LLM inference.

Instead of rotating KV Cache, the library applies an equivalent relative rotation solely to the query for each cache turn.

The implementation features custom CUDA kernels that support paged attention, varlen sequences, and fused operations. It is hardcoded for a GQA factor of 16, supports partial RoPE application on the head dimension, and operates on fp32, fp16, and bf16 dtypes.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
cmake		cmake
python/dvattn		python/dvattn
scripts		scripts
src		src
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
metadata_provider.py		metadata_provider.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Dynamic View Attention

About

Uh oh!

Releases

Packages

Languages

License

RealJosephus/dvattn

Folders and files

Latest commit

History

Repository files navigation

Dynamic View Attention

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages