Skip to content

Commit 39073a5

Browse files
HchnrCorle-hyz
andauthored
[Model] Add Ernie-4.5 (#637)
Adapted from this PR: vllm-project/vllm#20220 --------- Co-authored-by: Corle-hyz <heyongzhe18@mails.ucas.edu.cn>
1 parent 2490adf commit 39073a5

File tree

5 files changed

+1104
-0
lines changed

5 files changed

+1104
-0
lines changed

examples/ernie45/conf/serve.yaml

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
defaults:
2+
- _self_
3+
- serve: 300b
4+
5+
experiment:
6+
exp_name: ernie45_300b
7+
exp_dir: outputs/${experiment.exp_name}
8+
task:
9+
type: serve
10+
runner:
11+
hostfile: null
12+
deploy:
13+
use_fs_serve: false
14+
envs:
15+
CUDA_VISIBLE_DEVICES: 0,1,2,3,4,5,6,7
16+
CUDA_DEVICE_MAX_CONNECTIONS: 1
17+
18+
action: run
19+
20+
hydra:
21+
run:
22+
dir: ${experiment.exp_dir}/hydra

examples/ernie45/conf/serve/300b.yaml

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
- serve_id: vllm_model
2+
engine: vllm
3+
engine_args:
4+
model: /nfs/hcr/models/PaddlePaddle/ERNIE-4.5-300B-A47B-Base-PT
5+
host: 0.0.0.0
6+
max_model_len: 256
7+
max_num_seqs: 1
8+
uvicorn_log_level: warning
9+
port: 30000
10+
engine_args_specific:
11+
vllm:
12+
tensor_parallel_size: 8
13+
pipeline_parallel_size: 1
14+
gpu_memory_utilization: 0.98
15+
trust_remote_code: true
16+
# GMEM not enough if disable eager mode
17+
enforce_eager: true
18+
# enable_chunked_prefill: true
19+
profile:
20+
prefix_len: 0
21+
input_len: 1024
22+
output_len: 1024
23+
num_prompts: 128
24+
range_ratio: 1

0 commit comments

Comments
 (0)