32k模型多卡推理时出现CUDA out of memory #895
Replies: 2 comments 4 replies
-
毕竟进来的时候是要把完整的tensor先放在一张卡的 另外,P100架构太老了,各种不兼容也有可能,建议使用sm80以上的卡 |
Beta Was this translation helpful? Give feedback.
1 reply
-
32k微调时候,23G显存都不够,也报OutOfMemoryError。 |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
使用web_demo_streamlit.py, 用3张P100, 一张大约有16GB
load模型没有问题, 可以切割成3份到3张P100
但prompt token量大的时候, 好像只能用1张P100去推理
一直卡在
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 10.90 GiB (GPU 0; 15.90 GiB total capacity; 4.19 GiB already allocated; 10.70 GiB free; 4.31 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
请问可以如何解决
Beta Was this translation helpful? Give feedback.
All reactions