Replies: 1 comment
-
the task seems to be more complicated than I thought, in the first approximation, it will be necessary to raise several backends ggml_backend_rpc_start_server(backend, endpoint.c_str(), free_mem, total_mem), which is not good, it is necessary to “adapt” ggml_backend_cuda_init/ggml_backend. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello, we need summarize all(all videocards) gpu memory count when rpc. When rpc server running, we may have any count of gpu, we need count all memory. https://github.com/ggerganov/llama.cpp/blob/c05e8c9934f94fde49bc1bc9dc51eed282605150/examples/rpc/rpc-server.cpp#L116
Beta Was this translation helpful? Give feedback.
All reactions