Skip to content

Commit 26ca008

Browse files
crypdickhuydhn
authored andcommitted
[Docs] Improve documentation for Deepseek R1 on Ray Serve LLM (#20601)
Signed-off-by: Ricardo Decal <rdecal@anyscale.com>
1 parent 369ef5e commit 26ca008

File tree

1 file changed

+19
-12
lines changed

1 file changed

+19
-12
lines changed
Lines changed: 19 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,21 @@
11
# SPDX-License-Identifier: Apache-2.0
22
# SPDX-FileCopyrightText: Copyright contributors to the vLLM project
33
"""
4-
Example to deploy DeepSeek R1 or V3 with Ray Serve LLM.
5-
See more details at:
6-
https://docs.ray.io/en/latest/serve/tutorials/serve-deepseek.html
7-
And see Ray Serve LLM documentation at:
8-
https://docs.ray.io/en/latest/serve/llm/serving-llms.html
4+
Deploy DeepSeek R1 or V3 with Ray Serve LLM.
5+
6+
Ray Serve LLM is a scalable and production-grade model serving library built
7+
on the Ray distributed computing framework and first-class support for the vLLM engine.
8+
9+
Key features:
10+
- Automatic scaling, back-pressure, and load balancing across a Ray cluster.
11+
- Unified multi-node multi-model deployment.
12+
- Exposes an OpenAI-compatible HTTP API.
13+
- Multi-LoRA support with shared base models.
914
10-
Run `python3 ray_serve_deepseek.py` to deploy the model.
15+
Run `python3 ray_serve_deepseek.py` to launch an endpoint.
16+
17+
Learn more in the official Ray Serve LLM documentation:
18+
https://docs.ray.io/en/latest/serve/llm/serving-llms.html
1119
"""
1220

1321
from ray import serve
@@ -16,9 +24,8 @@
1624
llm_config = LLMConfig(
1725
model_loading_config={
1826
"model_id": "deepseek",
19-
# Since DeepSeek model is huge, it is recommended to pre-download
20-
# the model to local disk, say /path/to/the/model and specify:
21-
# model_source="/path/to/the/model"
27+
# Pre-downloading the model to local storage is recommended since
28+
# the model is large. Set model_source="/path/to/the/model".
2229
"model_source": "deepseek-ai/DeepSeek-R1",
2330
},
2431
deployment_config={
@@ -27,10 +34,10 @@
2734
"max_replicas": 1,
2835
}
2936
},
30-
# Change to the accelerator type of the node
37+
# Set to the node's accelerator type.
3138
accelerator_type="H100",
3239
runtime_env={"env_vars": {"VLLM_USE_V1": "1"}},
33-
# Customize engine arguments as needed (e.g. vLLM engine kwargs)
40+
# Customize engine arguments as required (for example, vLLM engine kwargs).
3441
engine_kwargs={
3542
"tensor_parallel_size": 8,
3643
"pipeline_parallel_size": 2,
@@ -44,6 +51,6 @@
4451
},
4552
)
4653

47-
# Deploy the application
54+
# Deploy the application.
4855
llm_app = build_openai_app({"llm_configs": [llm_config]})
4956
serve.run(llm_app)

0 commit comments

Comments
 (0)