-
Hello, I currently have GPUs on a cluster that are receiving offloaded layers from my llama.cpp on a different host through RPC servers. I'm trying to see which process is offloading the LLM inference onto the GPUs and trace the GPU events triggered by llama.cpp. Any advice on how to trace the GPU events triggered by RPC servers? |
Beta Was this translation helpful? Give feedback.
Answered by
rgerganov
Sep 17, 2024
Replies: 1 comment 1 reply
-
You can trace the calls to |
Beta Was this translation helpful? Give feedback.
1 reply
Answer selected by
Allan-Luu
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
You can trace the calls to
rpc-server
by settingGGML_DEBUG
to1
here.