Pinned Loading
-
TransformerEngine
TransformerEngine PublicForked from NVIDIA/TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…
Python
-
-
vllm
vllm PublicForked from vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Python
-
causal-conv1d
causal-conv1d PublicForked from Dao-AILab/causal-conv1d
Causal depthwise conv1d in CUDA, with a PyTorch interface
Cuda
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.