Skip to content

Commit 90a2769

Browse files
authored
[Docs] Add Ray Serve LLM section to openai compatible server guide (#20595)
Signed-off-by: Ricardo Decal <rdecal@anyscale.com>
1 parent e60d422 commit 90a2769

File tree

1 file changed

+14
-0
lines changed

1 file changed

+14
-0
lines changed

docs/serving/openai_compatible_server.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -775,3 +775,17 @@ The following extra parameters are supported:
775775
```python
776776
--8<-- "vllm/entrypoints/openai/protocol.py:rerank-extra-params"
777777
```
778+
779+
## Ray Serve LLM
780+
781+
Ray Serve LLM enables scalable, production-grade serving of the vLLM engine. It integrates tightly with vLLM and extends it with features such as auto-scaling, load balancing, and back-pressure.
782+
783+
Key capabilities:
784+
785+
- Exposes an OpenAI-compatible HTTP API as well as a Pythonic API.
786+
- Scales from a single GPU to a multi-node cluster without code changes.
787+
- Provides observability and autoscaling policies through Ray dashboards and metrics.
788+
789+
The following example shows how to deploy a large model like DeepSeek R1 with Ray Serve LLM: <gh-file:examples/online_serving/ray_serve_deepseek.py>.
790+
791+
Learn more about Ray Serve LLM with the official [Ray Serve LLM documentation](https://docs.ray.io/en/latest/serve/llm/serving-llms.html).

0 commit comments

Comments
 (0)