p/vllm%E5%90%AF%E5%8A%A8%E6%97%B6nccl%E9%81%87%E5%88%B0%E6%98%BE%E5%8D%A1p2p%E9%80%9A%E4%BF%A1%E9%97%AE%E9%A2%98/ #5
Replies: 1 comment
-
在docker容器中部署大模型的话还没发禁用ACS |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
p/vllm%E5%90%AF%E5%8A%A8%E6%97%B6nccl%E9%81%87%E5%88%B0%E6%98%BE%E5%8D%A1p2p%E9%80%9A%E4%BF%A1%E9%97%AE%E9%A2%98/
背景 使用Xinference(VLLM)启动Qwen1.5-110b-awq模型,需要把模型加载到多块显卡上。
系统配置
OS: Ubuntu 24.04 LTS x86_64 Kernel: 6.8.0-31-generic Shell: bash 5.2.21 CPU: AMD Ryzen Threadripper 7960X GPU: NVIDIA RTX 6000 Ada Generation GPU: NVIDIA RTX 6000 Ada Generation GPU: NVIDIA RTX 6000 Ada Generation Memory: 26786MiB / 257222MiB 症状 开机之后,第一次启动多卡的模型可以正常启动,但是模型退出之后,要再次启动,就会卡住。显卡利用率100%,不占用显存,CPU线程占满核心。但是只占用单卡的模型不受影响。
按照 https://docs.vllm.ai/en/stable/getting_started/debugging.html 打开vllm日志,最后的日志卡在NCCL的Init COMPLETE。使用vllm debugging指南的test.py,情况相同。
不打卡调试日志,运行输出卡在:
(VllmWorkerProcess pid=2602361) INFO 07-12 19:04:18 pynccl.py:63] vLLM is using nccl==2.20.5 打卡调试日志,进一步看到:
nvidia5:3312182:3312204 [0] NCCL INFO comm 0x5654d4843400 rank 0 nranks 2 cudaDev 0 nvmlDev 2 busId 41000 commId 0x737ce38e08817c77 - Init COMPLETE nvidia5:3312183:3312205 [1] NCCL INFO comm 0x561a161881d0 rank 1 nranks 2 cudaDev 1 nvmlDev 3 busId 61000 commId 0x737ce38e08817c77 - Init COMPLETE 尝试 相关issue:
https://huo.zai.meng.li/p/vllm%E5%90%AF%E5%8A%A8%E6%97%B6nccl%E9%81%87%E5%88%B0%E6%98%BE%E5%8D%A1p2p%E9%80%9A%E4%BF%A1%E9%97%AE%E9%A2%98/
Beta Was this translation helpful? Give feedback.
All reactions