llama-cli doesn't load model to GPU #14263

tmpuserx · 2025-06-18T13:55:58Z

tmpuserx
Jun 18, 2025

Hello, my system has a RTX3070 and CUDA installed properly. i downloaded llama-b5688-bin-win-cuda-12.4-x64 and used llama-cli to load llama-b5688-bin-win-cuda-12.4-x64 model from local. and it looks like it loaded into PC memory instead of GPU memory. anyone know how to get the model load to GPU? thanks.

aryan8433 · 2025-06-18T16:14:37Z

aryan8433
Jun 18, 2025

Use -ngl 99

0 replies

tmpuserx · 2025-06-19T01:51:41Z

tmpuserx
Jun 19, 2025
Author

Thanks. looks like i also have to copy the cudart-llama-bin-win-cuda-12.4-x64 runtime files into the folder so that it would work :)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

llama-cli doesn't load model to GPU #14263

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

llama-cli doesn't load model to GPU #14263

Uh oh!

tmpuserx Jun 18, 2025

Replies: 2 comments

Uh oh!

aryan8433 Jun 18, 2025

Uh oh!

tmpuserx Jun 19, 2025 Author

tmpuserx
Jun 18, 2025

aryan8433
Jun 18, 2025

tmpuserx
Jun 19, 2025
Author