Skip to content

Built with LLAMA_CUDA, GPU visible but inference not utilizing GPU #9751

Closed Answered by danbev
aleksas asked this question in Q&A
Discussion options

You must be logged in to vote

You will also need to specify the number of layers to be offloaded to the GPU by using --n-gpu-layers N.
More information can be found here.

Replies: 2 comments

Comment options

You must be logged in to vote
0 replies
Answer selected by aleksas
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants