Replies: 1 comment
-
For GPU, just set tensor split to only one of the GPU and split mode to None, it will remove the overhead and reach same performances of CUDA_VISIBLE_DEVICES |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, I would like to use either the CPU or GPU during runtime. Currently, it seems that this can only be controlled at the compilation stage using options like GGML_METAL or GGML_BLAS. Is there a way to configure the code to select my computation device before initializing ggml_backend?
Beta Was this translation helpful? Give feedback.
All reactions