Motivation / Purpose of using llama.cpp #8626

Urvesh71 · 2024-07-22T09:13:14Z

Urvesh71
Jul 22, 2024

I have a question: What is a Goal to use llama.cpp? Can we use the llama.cpp with in our local chatbot file to get a fast response (Fast inference)?

If yes, then what should I do when I want to use a Deepseek coder v2 Instruct (236B parameters, 132 GB in model size)? Can I use llama.cpp in local chatbot using Open source LLM : DS coder V2 version? Does it work (give response) faster than the normal response time?

dspasyuk · 2024-07-23T01:17:57Z

dspasyuk
Jul 23, 2024

@Urvesh71 Lllama.cpp is like an operating system for LLMs, it is a suite of programs, and you can use any models you like that are supported by llama.cpp. To use it with your llm, just download llama.cpp source and make it or download binary for your system. There are tons of projects already built on top of Llama.cpp, please read README file. You can run it like so on CPU or GPU (Linux, Mac):

Download the model: https://huggingface.co/dspasyuk/Meta-Llama-3-8B-Instruct-Q5_K_S-GGUF/blob/main/Meta-Llama-3-8B-Instruct-Q5_K_S.gguf

Run llama.cpp:

../llama.cpp/llama-cli --model ../../models/meta-llama-3-8b-instruct-q5_k_s.gguf -cnv -b 2048 --ctx_size 0 --temp 0 -fa --top_k 10 --multiline-input --chat-template llama3 --log-disable -p 'You are Alice, a large language model.' or you can try one of the llama.cpp-based

UIs:

https://github.com/dspasyuk/llama.cui

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Motivation / Purpose of using llama.cpp #8626

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Motivation / Purpose of using llama.cpp #8626

Uh oh!

Uh oh!

Urvesh71 Jul 22, 2024

Replies: 1 comment

Uh oh!

dspasyuk Jul 23, 2024

Urvesh71
Jul 22, 2024

dspasyuk
Jul 23, 2024