paulpak58

Follow

Paul Pak paulpak58

Follow

__syncthreads() @Liquid4All

22 followers · 4 following

Pinned Loading

TransformerEngine TransformerEngine Public

Forked from NVIDIA/TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…

Python
cutlass cutlass Public

Forked from NVIDIA/cutlass

CUDA Templates for Linear Algebra Subroutines

C++
vllm vllm Public

Forked from vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python
causal-conv1d causal-conv1d Public

Forked from Dao-AILab/causal-conv1d

Causal depthwise conv1d in CUDA, with a PyTorch interface

Cuda