You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+30-11Lines changed: 30 additions & 11 deletions
Original file line number
Diff line number
Diff line change
@@ -28,6 +28,30 @@ Inference of Meta's [LLaMA](https://arxiv.org/abs/2302.13971) model (and others)
28
28
29
29
----
30
30
31
+
## Quick start
32
+
33
+
Getting started with llama.cpp is straightforward. Here are several ways to install it on your machine:
34
+
35
+
- Install `llama.cpp` using [brew, nix or winget](docs/install.md)
36
+
- Run with Docker - see our [Docker documentation](docs/docker.md)
37
+
- Download pre-built binaries from the [releases page](https://github.com/ggml-org/llama.cpp/releases)
38
+
- Build from source by cloning this repository - check out [our build guide](docs/build.md)
39
+
40
+
Once installed, you'll need a model to work with. Head to the [Obtaining and quantizing models](#obtaining-and-quantizing-models) section to learn more.
41
+
42
+
Example command:
43
+
44
+
```sh
45
+
# Use a local model file
46
+
llama-cli -m my_model.gguf
47
+
48
+
# Or download and run a model directly from Hugging Face
49
+
llama-cli -hf ggml-org/gemma-3-1b-it-GGUF
50
+
51
+
# Launch OpenAI-compatible API server
52
+
llama-server -hf ggml-org/gemma-3-1b-it-GGUF
53
+
```
54
+
31
55
## Description
32
56
33
57
The main goal of `llama.cpp` is to enable LLM inference with minimal setup and state-of-the-art performance on a wide
@@ -230,6 +254,7 @@ Instructions for adding support for new models: [HOWTO-add-model.md](docs/develo
230
254
231
255
</details>
232
256
257
+
233
258
## Supported backends
234
259
235
260
| Backend | Target devices |
@@ -246,24 +271,18 @@ Instructions for adding support for new models: [HOWTO-add-model.md](docs/develo
246
271
|[OpenCL](docs/backend/OPENCL.md)| Adreno GPU |
247
272
|[RPC](https://github.com/ggml-org/llama.cpp/tree/master/tools/rpc)| All |
248
273
249
-
## Building the project
250
-
251
-
The main product of this project is the `llama` library. Its C-style interface can be found in [include/llama.h](include/llama.h).
252
-
The project also includes many example programs and tools using the `llama` library. The examples range from simple, minimal code snippets to sophisticated sub-projects such as an OpenAI-compatible HTTP server. Possible methods for obtaining the binaries:
253
-
254
-
- Clone this repository and build locally, see [how to build](docs/build.md)
255
-
- On MacOS or Linux, install `llama.cpp` via [brew, flox or nix](docs/install.md)
256
-
- Use a Docker image, see [documentation for Docker](docs/docker.md)
257
-
- Download pre-built binaries from [releases](https://github.com/ggml-org/llama.cpp/releases)
258
-
259
274
## Obtaining and quantizing models
260
275
261
276
The [Hugging Face](https://huggingface.co) platform hosts a [number of LLMs](https://huggingface.co/models?library=gguf&sort=trending) compatible with `llama.cpp`:
You can either manually download the GGUF file or directly use any `llama.cpp`-compatible models from [Hugging Face](https://huggingface.co/) or other model hosting sites, such as [ModelScope](https://modelscope.cn/), by using this CLI argument: `-hf <user>/<model>[:quant]`.
281
+
You can either manually download the GGUF file or directly use any `llama.cpp`-compatible models from [Hugging Face](https://huggingface.co/) or other model hosting sites, such as [ModelScope](https://modelscope.cn/), by using this CLI argument: `-hf <user>/<model>[:quant]`. For example:
282
+
283
+
```sh
284
+
llama-cli -hf ggml-org/gemma-3-1b-it-GGUF
285
+
```
267
286
268
287
By default, the CLI would download from Hugging Face, you can switch to other options with the environment variable `MODEL_ENDPOINT`. For example, you may opt to downloading model checkpoints from ModelScope or other model sharing communities by setting the environment variable, e.g. `MODEL_ENDPOINT=https://www.modelscope.cn/`.
Copy file name to clipboardExpand all lines: docs/build.md
+4Lines changed: 4 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,9 @@
1
1
# Build llama.cpp locally
2
2
3
+
The main product of this project is the `llama` library. Its C-style interface can be found in [include/llama.h](include/llama.h).
4
+
5
+
The project also includes many example programs and tools using the `llama` library. The examples range from simple, minimal code snippets to sophisticated sub-projects such as an OpenAI-compatible HTTP server.
This expression is automatically updated within the [nixpkgs repo](https://github.com/NixOS/nixpkgs/blob/nixos-24.05/pkgs/by-name/ll/llama-cpp/package.nix#L164).
37
-
38
-
## Flox
39
-
40
-
On Mac and Linux, Flox can be used to install llama.cpp within a Flox environment via
0 commit comments