@@ -58,8 +58,8 @@ vLLM is fast with:
58
58
- Efficient management of attention key and value memory with [ ** PagedAttention** ] ( https://blog.vllm.ai/2023/06/20/vllm.html )
59
59
- Continuous batching of incoming requests
60
60
- Fast model execution with CUDA/HIP graph
61
- - Quantizations: [ GPTQ] ( https://arxiv.org/abs/2210.17323 ) , [ AWQ] ( https://arxiv.org/abs/2306.00978 ) , [ AutoRound] ( https://arxiv.org/abs/2309.05516 ) ,INT4, INT8, and FP8.
62
- - Optimized CUDA kernels, including integration with FlashAttention and FlashInfer.
61
+ - Quantizations: [ GPTQ] ( https://arxiv.org/abs/2210.17323 ) , [ AWQ] ( https://arxiv.org/abs/2306.00978 ) , [ AutoRound] ( https://arxiv.org/abs/2309.05516 ) , INT4, INT8, and FP8
62
+ - Optimized CUDA kernels, including integration with FlashAttention and FlashInfer
63
63
- Speculative decoding
64
64
- Chunked prefill
65
65
@@ -72,14 +72,14 @@ vLLM is flexible and easy to use with:
72
72
- Tensor parallelism and pipeline parallelism support for distributed inference
73
73
- Streaming outputs
74
74
- OpenAI-compatible API server
75
- - Support NVIDIA GPUs, AMD CPUs and GPUs, Intel CPUs and GPUs, PowerPC CPUs, TPU, and AWS Neuron.
75
+ - Support NVIDIA GPUs, AMD CPUs and GPUs, Intel CPUs and GPUs, PowerPC CPUs, TPU, and AWS Neuron
76
76
- Prefix caching support
77
77
- Multi-LoRA support
78
78
79
79
vLLM seamlessly supports most popular open-source models on HuggingFace, including:
80
80
- Transformer-like LLMs (e.g., Llama)
81
81
- Mixture-of-Expert LLMs (e.g., Mixtral, Deepseek-V2 and V3)
82
- - Embedding Models (e.g. E5-Mistral)
82
+ - Embedding Models (e.g., E5-Mistral)
83
83
- Multi-modal LLMs (e.g., LLaVA)
84
84
85
85
Find the full list of supported models [ here] ( https://docs.vllm.ai/en/latest/models/supported_models.html ) .
@@ -162,4 +162,4 @@ If you use vLLM for your research, please cite our [paper](https://arxiv.org/abs
162
162
163
163
## Media Kit
164
164
165
- - If you wish to use vLLM's logo, please refer to [ our media kit repo] ( https://github.com/vllm-project/media-kit ) .
165
+ - If you wish to use vLLM's logo, please refer to [ our media kit repo] ( https://github.com/vllm-project/media-kit )
0 commit comments