Shush is an app that deploys a WhisperV3 model with Flash Attention v2 on Modal and makes requests to it via a NextJS app
-
Updated
Jun 7, 2024 - TypeScript
Shush is an app that deploys a WhisperV3 model with Flash Attention v2 on Modal and makes requests to it via a NextJS app
Triton implementation of FlashAttention2 that adds Custom Masks.
Uses the powerful WhisperS2T and Ctranslate2 libraries to batch transcribe multiple files
Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
A flexible and efficient implementation of Flash Attention 2.0 for JAX, supporting multiple backends (GPU/TPU/CPU) and platforms (Triton/Pallas/JAX).
A cutlass cute implementation of headdim-64 flashattentionv2 TensorRT plugin for LightGlue. Run on Jetson Orin NX 8GB with TensorRT 8.5.2.
Vulkan & GLSL implementation of FlashAttention-2
FlashAttention-2 in Triton for sliding window attention (fwd + bwd pass)
Poplar implementation of FlashAttention for IPU
Toy Flash Attention implementation in torch
Coding Flash Attention from scratch using triton
Transcribe audio in minutes with OpenAI's WhisperV3 and Flash Attention v2 + Transformers without relying on third-party providers and APIs. Host it yourself or try it out.
Vulkan & GLSL implementation of FlashAttention-2
Add a description, image, and links to the flash-attention-2 topic page so that developers can more easily learn about it.
To associate your repository with the flash-attention-2 topic, visit your repo's landing page and select "manage topics."