Replies: 1 comment
-
Same question for mixed bag of 7 gpus and model Kimi K2 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Is it possible to share MoE model stored in RAM across multiple-GPUs?
Suppose there are 4 GPUs, it can handle 4 inference requests at the same time. All GPU can load MoE tensors from RAM for inference.
Beta Was this translation helpful? Give feedback.
All reactions