Skip to content

ROCm/vllm

About

A high-throughput and memory-efficient inference and serving engine for LLMs

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 86.2%
  • Cuda 7.9%
  • C++ 4.4%
  • Shell 0.7%
  • C 0.4%
  • CMake 0.3%
  • Other 0.1%