Skip to content

Commit df11fb7

Browse files
committed
Add information for Podman as well as Docker
We believe Podman is a viable alternative to Docker. Lots of people have moved to Podman, and the project should make sure people adopt it. Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
1 parent 106045e commit df11fb7

File tree

3 files changed

+51
-37
lines changed

3 files changed

+51
-37
lines changed

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -242,7 +242,7 @@ The project also includes many example programs and tools using the `llama` libr
242242

243243
- Clone this repository and build locally, see [how to build](docs/build.md)
244244
- On MacOS or Linux, install `llama.cpp` via [brew, flox or nix](docs/install.md)
245-
- Use a Docker image, see [documentation for Docker](docs/docker.md)
245+
- Use a container image (Docker/Podman), see [documentation for containers](docs/container.md)
246246
- Download pre-built binaries from [releases](https://github.com/ggerganov/llama.cpp/releases)
247247

248248
## Obtaining and quantizing models
@@ -500,7 +500,7 @@ To learn more about model quantization, [read this documentation](examples/quant
500500
#### Development documentation
501501
502502
- [How to build](docs/build.md)
503-
- [Running on Docker](docs/docker.md)
503+
- [Running in a container](docs/container.md)
504504
- [Build on Android](docs/android.md)
505505
- [Performance troubleshooting](docs/development/token_generation_performance_tips.md)
506506
- [GGML tips & tricks](https://github.com/ggerganov/llama.cpp/wiki/GGML-Tips-&-Tricks)

docs/build.md

Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -94,13 +94,13 @@ Building through oneAPI compilers will make avx_vnni instruction set available f
9494
- Using manual oneAPI installation:
9595
By default, `GGML_BLAS_VENDOR` is set to `Generic`, so if you already sourced intel environment script and assign `-DGGML_BLAS=ON` in cmake, the mkl version of Blas will automatically been selected. Otherwise please install oneAPI and follow the below steps:
9696
```bash
97-
source /opt/intel/oneapi/setvars.sh # You can skip this step if in oneapi-basekit docker image, only required for manual installation
97+
source /opt/intel/oneapi/setvars.sh # You can skip this step if in oneapi-basekit container image, only required for manual installation
9898
cmake -B build -DGGML_BLAS=ON -DGGML_BLAS_VENDOR=Intel10_64lp -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DGGML_NATIVE=ON
9999
cmake --build build --config Release
100100
```
101101
102-
- Using oneAPI docker image:
103-
If you do not want to source the environment vars and install oneAPI manually, you can also build the code using intel docker container: [oneAPI-basekit](https://hub.docker.com/r/intel/oneapi-basekit). Then, you can use the commands given above.
102+
- Using oneAPI container image:
103+
If you do not want to source the environment vars and install oneAPI manually, you can also build the code using intel container: [oneAPI-basekit](https://hub.docker.com/r/intel/oneapi-basekit). Then, you can use the commands given above.
104104
105105
Check [Optimizing and Running LLaMA2 on Intel® CPU](https://www.intel.com/content/www/us/en/content-details/791610/optimizing-and-running-llama2-on-intel-cpu.html) for more information.
106106
@@ -280,19 +280,21 @@ cmake -B build -DGGML_VULKAN=ON
280280
cmake --build build --config Release
281281
```
282282

283-
**With docker**:
283+
**With containers**:
284284

285285
You don't need to install Vulkan SDK. It will be installed inside the container.
286286

287-
```sh
288287
# Build the image
289-
docker build -t llama-cpp-vulkan --target light -f .devops/vulkan.Dockerfile .
288+
289+
<details><summary>Docker example</summary>docker build -t llama-cpp-vulkan --target light -f .devops/vulkan.Dockerfile .</details>
290+
<details><summary>Podman example</summary>podman build -t llama-cpp-vulkan --target light -f .devops/vulkan.Dockerfile .</details>
291+
290292

291293
# Then, use it:
292-
docker run -it --rm -v "$(pwd):/app:Z" --device /dev/dri/renderD128:/dev/dri/renderD128 --device /dev/dri/card1:/dev/dri/card1 llama-cpp-vulkan -m "/app/models/YOUR_MODEL_FILE" -p "Building a website can be done in 10 simple steps:" -n 400 -e -ngl 33
293-
```
294+
<details><summary>Docker example</summary>docker run -it --rm -v "$(pwd):/app:Z" --device /dev/dri/renderD128:/dev/dri/renderD128 --device /dev/dri/card1:/dev/dri/card1 llama-cpp-vulkan -m "/app/models/YOUR_MODEL_FILE" -p "Building a website can be done in 10 simple steps:" -n 400 -e -ngl 33</details>
295+
<details><summary>Podman example</summary>podman run --security-opt label=disable -it --rm -v "$(pwd):/app:Z" --device /dev/dri/renderD128:/dev/dri/renderD128 --device /dev/dri/card1:/dev/dri/card1 llama-cpp-vulkan -m "/app/models/YOUR_MODEL_FILE" -p "Building a website can be done in 10 simple steps:" -n 400 -e -ngl 33</details>
294296

295-
**Without docker**:
297+
**Without a container**:
296298

297299
Firstly, you need to make sure you have installed [Vulkan SDK](https://vulkan.lunarg.com/doc/view/latest/linux/getting_started_ubuntu.html)
298300

docs/docker.md renamed to docs/container.md

Lines changed: 38 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
1-
# Docker
1+
# Container
22

33
## Prerequisites
4-
* Docker must be installed and running on your system.
4+
* A container engine, ie Docker/Podman, must be installed and running on your system.
55
* Create a folder to store big models & intermediate files (ex. /llama/models)
66

77
## Images
8-
We have three Docker images available for this project:
8+
We have three container images available for this project:
99

1010
1. `ghcr.io/ggerganov/llama.cpp:full`: This image includes both the main executable file and the tools to convert LLaMA models into ggml and convert into 4-bit quantization. (platforms: `linux/amd64`, `linux/arm64`)
1111
2. `ghcr.io/ggerganov/llama.cpp:light`: This image only includes the main executable file. (platforms: `linux/amd64`, `linux/arm64`)
@@ -27,43 +27,45 @@ The GPU enabled images are not currently tested by CI beyond being built. They a
2727

2828
## Usage
2929

30-
The easiest way to download the models, convert them to ggml and optimize them is with the --all-in-one command which includes the full docker image.
30+
The easiest way to download the models, convert them to ggml and optimize them is with the --all-in-one command which includes the full container image.
3131

3232
Replace `/path/to/models` below with the actual path where you downloaded the models.
3333

34-
```bash
35-
docker run -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:full --all-in-one "/models/" 7B
36-
```
34+
<details><summary>Docker example</summary>docker run -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:full --all-in-one "/models/" 7B</details>
35+
<details><summary>Podman example</summary>podman run --security-opt label=disable -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:full --all-in-one "/models/" 7B</details>
3736

3837
On completion, you are ready to play!
3938

40-
```bash
41-
docker run -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:full --run -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512
42-
```
39+
<details><summary>Docker example</summary>docker run -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:full --run -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512</details>
40+
<details><summary>Podman example</summary>podman run --security-opt label=disable -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:full --run -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512</details>
4341

4442
or with a light image:
4543

46-
```bash
47-
docker run -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:light -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512
48-
```
44+
<details><summary>Docker example</summary>docker run -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:light -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512</details>
45+
<details><summary>Podman example</summary>podman run --security-opt label=disable -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:light -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512</details>
4946

5047
or with a server image:
5148

52-
```bash
53-
docker run -v /path/to/models:/models -p 8000:8000 ghcr.io/ggerganov/llama.cpp:server -m /models/7B/ggml-model-q4_0.gguf --port 8000 --host 0.0.0.0 -n 512
54-
```
49+
<details><summary>Docker example</summary>docker run -v /path/to/models:/models -p 8000:8000 ghcr.io/ggerganov/llama.cpp:server -m /models/7B/ggml-model-q4_0.gguf --port 8000 --host 0.0.0.0 -n 512</details>
50+
<details><summary>Podman example</summary>podman run --security-opt label=disable -v /path/to/models:/models -p 8000:8000 ghcr.io/ggerganov/llama.cpp:server -m /models/7B/ggml-model-q4_0.gguf --port 8000 --host 0.0.0.0 -n 512</details>
5551

56-
## Docker With CUDA
52+
53+
## Container engines With CUDA
5754

5855
Assuming one has the [nvidia-container-toolkit](https://github.com/NVIDIA/nvidia-container-toolkit) properly installed on Linux, or is using a GPU enabled cloud, `cuBLAS` should be accessible inside the container.
5956

60-
## Building Docker locally
57+
## Building Container locally
6158

62-
```bash
59+
<details><summary>Docker example</summary>
6360
docker build -t local/llama.cpp:full-cuda --target full -f .devops/cuda.Dockerfile .
6461
docker build -t local/llama.cpp:light-cuda --target light -f .devops/cuda.Dockerfile .
6562
docker build -t local/llama.cpp:server-cuda --target server -f .devops/cuda.Dockerfile .
66-
```
63+
</details>
64+
<details><summary>Podman example</summary>
65+
podman build -t local/llama.cpp:full-cuda --target full -f .devops/cuda.Dockerfile .
66+
podman build -t local/llama.cpp:light-cuda --target light -f .devops/cuda.Dockerfile .
67+
podman build -t local/llama.cpp:server-cuda --target server -f .devops/cuda.Dockerfile .
68+
</details>
6769

6870
You may want to pass in some different `ARGS`, depending on the CUDA environment supported by your container host, as well as the GPU architecture.
6971

@@ -82,23 +84,33 @@ The resulting images, are essentially the same as the non-CUDA images:
8284

8385
After building locally, Usage is similar to the non-CUDA examples, but you'll need to add the `--gpus` flag. You will also want to use the `--n-gpu-layers` flag.
8486

85-
```bash
87+
<details><summary>Docker example</summary>
8688
docker run --gpus all -v /path/to/models:/models local/llama.cpp:full-cuda --run -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512 --n-gpu-layers 1
8789
docker run --gpus all -v /path/to/models:/models local/llama.cpp:light-cuda -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512 --n-gpu-layers 1
8890
docker run --gpus all -v /path/to/models:/models local/llama.cpp:server-cuda -m /models/7B/ggml-model-q4_0.gguf --port 8000 --host 0.0.0.0 -n 512 --n-gpu-layers 1
89-
```
91+
</details>
92+
<details><summary>Podman example</summary>
93+
podman run --security-opt label=disable --gpus all -v /path/to/models:/models local/llama.cpp:full-cuda --run -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512 --n-gpu-layers 1
94+
podman run --security-opt label=disable --gpus all -v /path/to/models:/models local/llama.cpp:light-cuda -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512 --n-gpu-layers 1
95+
podman run --security-opt label=disable --gpus all -v /path/to/models:/models local/llama.cpp:server-cuda -m /models/7B/ggml-model-q4_0.gguf --port 8000 --host 0.0.0.0 -n 512 --n-gpu-layers 1
96+
</details>
9097

91-
## Docker With MUSA
98+
## Container engines With MUSA
9299

93100
Assuming one has the [mt-container-toolkit](https://developer.mthreads.com/musa/native) properly installed on Linux, `muBLAS` should be accessible inside the container.
94101

95-
## Building Docker locally
102+
## Building Container images locally
96103

97-
```bash
104+
<details><summary>Docker example</summary>
98105
docker build -t local/llama.cpp:full-musa --target full -f .devops/musa.Dockerfile .
99106
docker build -t local/llama.cpp:light-musa --target light -f .devops/musa.Dockerfile .
100107
docker build -t local/llama.cpp:server-musa --target server -f .devops/musa.Dockerfile .
101-
```
108+
</details>
109+
<details><summary>Podman example</summary>
110+
podman build -t local/llama.cpp:full-musa --target full -f .devops/musa.Dockerfile .
111+
podman build -t local/llama.cpp:light-musa --target light -f .devops/musa.Dockerfile .
112+
podman build -t local/llama.cpp:server-musa --target server -f .devops/musa.Dockerfile .
113+
</details>
102114

103115
You may want to pass in some different `ARGS`, depending on the MUSA environment supported by your container host, as well as the GPU architecture.
104116

0 commit comments

Comments
 (0)