Skip to content

ernie多模态大模型怎么实现多个请求同时处理? #1170

@shuzhikun

Description

@shuzhikun

python -m fastdeploy.entrypoints.openai.api_server
--model baidu/ERNIE-4.5-VL-28B-A3B-Paddle
--port 8180
--metrics-port 8181
--engine-worker-queue-port 8182
--max-model-len 32768
--enable-mm
--reasoning-parser ernie-45-vl
--max-num-seqs 32
设置了 --max-num-seqs 32, 多个大模型请求访问8180端口的时候大模型还是串行处理的,并没有实现并发处理。

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions