能否将更多的操作移动到GPU? #739
vonchenplus
started this conversation in
General
Replies: 2 comments
-
同问,是否可以在内存基本够用的情况下,通过提升显存来提高token的速度?KT是否有相应的分配机制? |
Beta Was this translation helpful? Give feedback.
0 replies
-
目前可以通过改yaml来调整显存/内存分配,但是目前这么做需要关cudagraph,所以收益和损耗可能会互相抵消,之后我们会优化这一点。 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
目前加载DeepseekV3-q4km (int4)使用了大量内存和少量的显存,是不是可以考虑将更多的操作放到GPU。
目前的策略无法提供很高的并发,所以如果能有一个机制可以配置更多参数到GPU,提高并发量,会不会更好?
Beta Was this translation helpful? Give feedback.
All reactions