Replies: 1 comment
-
@Urvesh71 Lllama.cpp is like an operating system for LLMs, it is a suite of programs, and you can use any models you like that are supported by llama.cpp. To use it with your llm, just download llama.cpp source and make it or download binary for your system. There are tons of projects already built on top of Llama.cpp, please read README file. You can run it like so on CPU or GPU (Linux, Mac): Download the model: https://huggingface.co/dspasyuk/Meta-Llama-3-8B-Instruct-Q5_K_S-GGUF/blob/main/Meta-Llama-3-8B-Instruct-Q5_K_S.gguf Run llama.cpp:
UIs: |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I have a question: What is a Goal to use llama.cpp? Can we use the llama.cpp with in our local chatbot file to get a fast response (Fast inference)?
If yes, then what should I do when I want to use a Deepseek coder v2 Instruct (236B parameters, 132 GB in model size)? Can I use llama.cpp in local chatbot using Open source LLM : DS coder V2 version? Does it work (give response) faster than the normal response time?
Beta Was this translation helpful? Give feedback.
All reactions