Skip to content

Commit 830de5a

Browse files
authored
[XPU] Supports TP4 deployment on 4,5,6,7 (#2794)
* 支持通过 XPU_VISIBLE_DEVICES 指定 4,5,6,7 卡运行 * 修改 XPU 文档中多卡说明
1 parent d33105b commit 830de5a

File tree

3 files changed

+14
-2
lines changed

3 files changed

+14
-2
lines changed

docs/get_started/installation/kunlunxin_xpu.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -156,7 +156,7 @@ python -m fastdeploy.entrypoints.openai.api_server \
156156
**Deploy the ERNIE-4.5-300B-A47B-Paddle model with WINT4 precision and 32K context length on 4 XPUs**
157157

158158
```bash
159-
export XPU_VISIBLE_DEVICES="0,1,2,3"
159+
export XPU_VISIBLE_DEVICES="0,1,2,3" # Specify which cards to be used
160160
python -m fastdeploy.entrypoints.openai.api_server \
161161
--model baidu/ERNIE-4.5-300B-A47B-Paddle \
162162
--port 8188 \
@@ -167,6 +167,11 @@ python -m fastdeploy.entrypoints.openai.api_server \
167167
--gpu-memory-utilization 0.9
168168
```
169169

170+
**Note:** When deploying on 4 XPUs, only two configurations are supported which constrained by hardware limitations such as interconnect capabilities.
171+
`export XPU_VISIBLE_DEVICES="0,1,2,3"`
172+
or
173+
`export XPU_VISIBLE_DEVICES="4,5,6,7"`
174+
170175
Refer to [Parameters](../../parameters.md) for more options.
171176

172177
#### Send requests

docs/zh/get_started/installation/kunlunxin_xpu.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -157,7 +157,7 @@ python -m fastdeploy.entrypoints.openai.api_server \
157157
**基于 WINT4 精度和 32K 上下文部署 ERNIE-4.5-300B-A47B-Paddle 模型到 4 卡 P800 服务器**
158158

159159
```bash
160-
export XPU_VISIBLE_DEVICES="0,1,2,3"
160+
export XPU_VISIBLE_DEVICES="0,1,2,3" # 设置使用的 XPU 卡
161161
python -m fastdeploy.entrypoints.openai.api_server \
162162
--model baidu/ERNIE-4.5-300B-A47B-Paddle \
163163
--port 8188 \
@@ -168,6 +168,11 @@ python -m fastdeploy.entrypoints.openai.api_server \
168168
--gpu-memory-utilization 0.9
169169
```
170170

171+
**注意:** 使用 P800 在 4 块 XPU 上进行部署时,由于受到卡间互联拓扑等硬件限制,仅支持以下两种配置方式:
172+
`export XPU_VISIBLE_DEVICES="0,1,2,3"`
173+
or
174+
`export XPU_VISIBLE_DEVICES="4,5,6,7"`
175+
171176
更多参数可以参考 [参数说明](../../parameters.md)
172177

173178
#### 请求服务

fastdeploy/engine/config.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -686,6 +686,8 @@ def __init__(
686686
self.engine_worker_queue_port = engine_worker_queue_port
687687
self.device_ids = ",".join([str(i) for i in range(self.worker_num_per_node)])
688688
self.device_ids = os.getenv("CUDA_VISIBLE_DEVICES", self.device_ids)
689+
if current_platform.is_xpu():
690+
self.device_ids = os.getenv("XPU_VISIBLE_DEVICES", self.device_ids)
689691

690692
self.enable_logprob = enable_logprob
691693

0 commit comments

Comments
 (0)