flash-attention-2

Here are 15 public repositories matching this topic...

arihanv / Shush

Shush is an app that deploys a WhisperV3 model with Flash Attention v2 on Modal and makes requests to it via a NextJS app

machine-learning modal transcription whisper huggingface-transformers shadcn-ui flash-attention-2

Updated Jun 7, 2024
TypeScript

alexzhang13 / flashattention2-custom-mask

Star

Triton implementation of FlashAttention2 that adds Custom Masks.

deep-learning triton attention cuda-kernels attention-mechanism triton-lang flash-attention flash-attention-2

Updated Aug 14, 2024
Python

BBC-Esq / WhisperS2T-transcriber

Star

Uses the powerful WhisperS2T and Ctranslate2 libraries to batch transcribe multiple files

audio-recorder audio-recording transcription audio-transcribing transcriber audio-transcription transcr ctranslate2 flash-attention-2 whispers2t

Updated Mar 13, 2025
Python

Bruce-Lee-LY / flash_attention_inference

Star

Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.

gpu cuda inference nvidia cutlass mha multi-head-attention llm tensor-core large-language-model flash-attention flash-attention-2

Updated Feb 27, 2025
C++

erfanzar / jax-flash-attn2

Star

A flexible and efficient implementation of Flash Attention 2.0 for JAX, supporting multiple backends (GPU/TPU/CPU) and platforms (Triton/Pallas/JAX).

pallas jax flash-attention flash-attention-2

Updated Mar 4, 2025
Python

qdLMF / LightGlue-with-FlashAttentionV2-TensorRT

Star

A cutlass cute implementation of headdim-64 flashattentionv2 TensorRT plugin for LightGlue. Run on Jetson Orin NX 8GB with TensorRT 8.5.2.

cuda transformer cutlass cute tensorrt feature-matching multihead-attention superpoint lightglue flash-attention flash-attention-2

Updated Mar 3, 2025
Cuda

MaxLSB / flash-attn2

Star

FlashAttention for sliding window attention in Triton (fwd + bwd pass)

python deep-learning pytorch triton sliding-window flash-attention-2 flashattention

Updated Jun 25, 2025
Python

etasnadi / VulkanCooperativeMatrixAttention

Star

Vulkan & GLSL implementation of FlashAttention-2

vulkan glsl artificial-intelligence gpu-acceleration attention gpu-computing deel-learning tensor-cores large-language-models llm flash-attention flash-attention-2

Updated Jan 19, 2025
C++

graphcore-research / flash-attention-ipu

Star

Poplar implementation of FlashAttention for IPU

deep-learning transformers pytorch ipu graphcore poplar flash-attention flash-attention-2

Updated Mar 12, 2024
C++

kiranbaby14 / TalkMateAI

Star

🎭 Real-time voice-controlled 3D avatar with multimodal AI - speak naturally and watch your AI companion respond with perfect lip-sync

websocket nextjs vlm fastapi huggingface whisper-ai flash-attention-2 multimodal-ai kokoro-tts smolvlm

Updated Jul 5, 2025
TypeScript

gietema / attention

Star

Toy Flash Attention implementation in torch

torch flash-attention flash-attention-2 flash-attention-3

Updated Sep 22, 2024
Python

IsmaelMousa / automatic-essay-grading

Star

Systematically train and benchmark Mistral, Qwen2.5, and SmolLM2 on essay grading across 39 experiments through data analysis and engineering, structured preprocessing, instruction tuning, postprocessing, and leakage aware evaluation for robust score and rationale generation

modeling evaluation transformers data-engineering data-analysis lora preprocessing postprocessing automatic-essay-scoring instruction-tuning supervised-finetuning flash-attention-2 mistral-7b unsloth qwen2-5 smollm2

Updated May 31, 2025
Jupyter Notebook

eljandoubi / triton-flash-attn

Star

Coding Flash Attention from scratch using triton

triton flash-attention-2

Updated Mar 30, 2025
Python

lalitdotdev / transcribeX

Star

Transcribe audio in minutes with OpenAI's WhisperV3 and Flash Attention v2 + Transformers without relying on third-party providers and APIs. Host it yourself or try it out.

python modal transformers transcription wavesurfer-js nvidia-cuda bun nvidia-gpu virtual-environment fastapi huggingface-transformers flash-attention-2 next14 whisper- whisperv3

Updated Jun 18, 2024
TypeScript

8e8bdba457c18cf692a95fe2ec67000b / VulkanCooperativeMatrixAttention

Star

Vulkan & GLSL implementation of FlashAttention-2

vulkan glsl artificial-intelligence gpu-acceleration attention gpu-computing deel-learning tensor-cores large-language-models llm flash-attention flash-attention-2

Updated Jul 18, 2025

Improve this page

Add a description, image, and links to the flash-attention-2 topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the flash-attention-2 topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

flash-attention-2

Here are 15 public repositories matching this topic...

arihanv / Shush

alexzhang13 / flashattention2-custom-mask

BBC-Esq / WhisperS2T-transcriber

Bruce-Lee-LY / flash_attention_inference

erfanzar / jax-flash-attn2

qdLMF / LightGlue-with-FlashAttentionV2-TensorRT

MaxLSB / flash-attn2

etasnadi / VulkanCooperativeMatrixAttention

graphcore-research / flash-attention-ipu

kiranbaby14 / TalkMateAI

gietema / attention

IsmaelMousa / automatic-essay-grading

eljandoubi / triton-flash-attn

lalitdotdev / transcribeX

8e8bdba457c18cf692a95fe2ec67000b / VulkanCooperativeMatrixAttention

Improve this page

Add this topic to your repo