You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We believe Podman is a viable alternative to Docker.
Lots of people have moved to Podman, and the project
should make sure people adopt it.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Copy file name to clipboardExpand all lines: docs/build.md
+11-9Lines changed: 11 additions & 9 deletions
Original file line number
Diff line number
Diff line change
@@ -94,13 +94,13 @@ Building through oneAPI compilers will make avx_vnni instruction set available f
94
94
- Using manual oneAPI installation:
95
95
By default, `GGML_BLAS_VENDOR` is set to `Generic`, so if you already sourced intel environment script and assign `-DGGML_BLAS=ON` in cmake, the mkl version of Blas will automatically been selected. Otherwise please install oneAPI and follow the below steps:
96
96
```bash
97
-
source /opt/intel/oneapi/setvars.sh # You can skip this step if in oneapi-basekit docker image, only required for manual installation
97
+
source /opt/intel/oneapi/setvars.sh # You can skip this step if in oneapi-basekit container image, only required for manual installation
If you do not want to source the environment vars and install oneAPI manually, you can also build the code using intel docker container: [oneAPI-basekit](https://hub.docker.com/r/intel/oneapi-basekit). Then, you can use the commands given above.
102
+
- Using oneAPI container image:
103
+
If you do not want to source the environment vars and install oneAPI manually, you can also build the code using intel container: [oneAPI-basekit](https://hub.docker.com/r/intel/oneapi-basekit). Then, you can use the commands given above.
104
104
105
105
Check [Optimizing and Running LLaMA2 on Intel® CPU](https://www.intel.com/content/www/us/en/content-details/791610/optimizing-and-running-llama2-on-intel-cpu.html) for more information.
Copy file name to clipboardExpand all lines: docs/container.md
+38-26Lines changed: 38 additions & 26 deletions
Original file line number
Diff line number
Diff line change
@@ -1,11 +1,11 @@
1
-
# Docker
1
+
# Container
2
2
3
3
## Prerequisites
4
-
* Docker must be installed and running on your system.
4
+
*A container engine, ie Docker/Podman, must be installed and running on your system.
5
5
* Create a folder to store big models & intermediate files (ex. /llama/models)
6
6
7
7
## Images
8
-
We have three Docker images available for this project:
8
+
We have three container images available for this project:
9
9
10
10
1.`ghcr.io/ggerganov/llama.cpp:full`: This image includes both the main executable file and the tools to convert LLaMA models into ggml and convert into 4-bit quantization. (platforms: `linux/amd64`, `linux/arm64`)
11
11
2.`ghcr.io/ggerganov/llama.cpp:light`: This image only includes the main executable file. (platforms: `linux/amd64`, `linux/arm64`)
@@ -27,43 +27,45 @@ The GPU enabled images are not currently tested by CI beyond being built. They a
27
27
28
28
## Usage
29
29
30
-
The easiest way to download the models, convert them to ggml and optimize them is with the --all-in-one command which includes the full docker image.
30
+
The easiest way to download the models, convert them to ggml and optimize them is with the --all-in-one command which includes the full container image.
31
31
32
32
Replace `/path/to/models` below with the actual path where you downloaded the models.
33
33
34
-
```bash
35
-
docker run -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:full --all-in-one "/models/" 7B
36
-
```
34
+
<details><summary>Docker example</summary>docker run -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:full --all-in-one "/models/" 7B</details>
docker run -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:full --run -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512
42
-
```
39
+
<details><summary>Docker example</summary>docker run -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:full --run -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512</details>
40
+
<details><summary>Podman example</summary>podman run --security-opt label=disable -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:full --run -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512</details>
43
41
44
42
or with a light image:
45
43
46
-
```bash
47
-
docker run -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:light -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512
48
-
```
44
+
<details><summary>Docker example</summary>docker run -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:light -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512</details>
45
+
<details><summary>Podman example</summary>podman run --security-opt label=disable -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:light -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512</details>
Assuming one has the [nvidia-container-toolkit](https://github.com/NVIDIA/nvidia-container-toolkit) properly installed on Linux, or is using a GPU enabled cloud, `cuBLAS` should be accessible inside the container.
59
56
60
-
## Building Docker locally
57
+
## Building Container locally
61
58
62
-
```bash
59
+
<details><summary>Docker example</summary>
63
60
docker build -t local/llama.cpp:full-cuda --target full -f .devops/cuda.Dockerfile .
podman build -t local/llama.cpp:server-cuda --target server -f .devops/cuda.Dockerfile .
68
+
</details>
67
69
68
70
You may want to pass in some different `ARGS`, depending on the CUDA environment supported by your container host, as well as the GPU architecture.
69
71
@@ -82,23 +84,33 @@ The resulting images, are essentially the same as the non-CUDA images:
82
84
83
85
After building locally, Usage is similar to the non-CUDA examples, but you'll need to add the `--gpus` flag. You will also want to use the `--n-gpu-layers` flag.
84
86
85
-
```bash
87
+
<details><summary>Docker example</summary>
86
88
docker run --gpus all -v /path/to/models:/models local/llama.cpp:full-cuda --run -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512 --n-gpu-layers 1
87
89
docker run --gpus all -v /path/to/models:/models local/llama.cpp:light-cuda -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512 --n-gpu-layers 1
88
90
docker run --gpus all -v /path/to/models:/models local/llama.cpp:server-cuda -m /models/7B/ggml-model-q4_0.gguf --port 8000 --host 0.0.0.0 -n 512 --n-gpu-layers 1
89
-
```
91
+
</details>
92
+
<details><summary>Podman example</summary>
93
+
podman run --security-opt label=disable --gpus all -v /path/to/models:/models local/llama.cpp:full-cuda --run -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512 --n-gpu-layers 1
94
+
podman run --security-opt label=disable --gpus all -v /path/to/models:/models local/llama.cpp:light-cuda -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512 --n-gpu-layers 1
95
+
podman run --security-opt label=disable --gpus all -v /path/to/models:/models local/llama.cpp:server-cuda -m /models/7B/ggml-model-q4_0.gguf --port 8000 --host 0.0.0.0 -n 512 --n-gpu-layers 1
96
+
</details>
90
97
91
-
## Docker With MUSA
98
+
## Container engines With MUSA
92
99
93
100
Assuming one has the [mt-container-toolkit](https://developer.mthreads.com/musa/native) properly installed on Linux, `muBLAS` should be accessible inside the container.
94
101
95
-
## Building Docker locally
102
+
## Building Container images locally
96
103
97
-
```bash
104
+
<details><summary>Docker example</summary>
98
105
docker build -t local/llama.cpp:full-musa --target full -f .devops/musa.Dockerfile .
0 commit comments