Skip to content

[Docs] Optimal Deployment #2768

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from
Open

Conversation

ming1753
Copy link
Collaborator

@ming1753 ming1753 commented Jul 9, 2025

Add ERNIE-4.5-VL-28B-A3B-Paddle Optimal Deployment

Copy link

paddle-bot bot commented Jul 9, 2025

Thanks for your contribution!

> **gpu-memory-utilization**
- **参数:** `--gpu-memory-utilization`
- **用处:** 用于控制 FastDeploy 初始化服务的可用显存,默认0.9,即预留10%的显存备用。
- **推荐:** A卡上推荐0.9,H卡上推荐0.8~0.9。如果服务压测时提示显存不足,可以尝试调低该值。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A卡-> A100/A800
H卡 -> H100/H800?

--quantization wint4 \
--enable-mm \
```
### **Example**: Dual-GPU Wint8 with 128K Context Length Configuration
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Context Length Configuration -> context length,Wint8 -> ?

@ming1753 ming1753 changed the title Optimal Deployment [Docs] Optimal Deployment Jul 9, 2025
|:----------:|:----------:|:------:|:------:|
| A30 | wint4 | 432.99 | 17396.92 |
| L20 | wint4<br>wint8 | 3311.34<br>2423.36 | 46566.81<br>60790.91 |
| H20 | wint4<br>wint8<br>bfloat16 | 3827.27<br>3578.23<br>4100.83 | 89770.14<br>95434.02<br>84543.00 |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the bf16 is best?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants