Discord Server

Join the discord server if you are interested in LLM architecture or distributed training/inference research.

Efficient GPU kernels written in both CUDA and Triton

CuteInductor allows easier injection of kernels contained in this repository into any PyTorch module.

Name		Name	Last commit message	Last commit date
Latest commit History 247 Commits
.github/workflows		.github/workflows
assets		assets
copyright @ 017498e		copyright @ 017498e
cute_kernels		cute_kernels
cutlass @ c2ad7c5		cutlass @ c2ad7c5
examples		examples
tests		tests
tools		tools
.clang-format		.clang-format
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
a.py		a.py
copyright-exclude.txt		copyright-exclude.txt
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py