|
1 |
| -# docker-llamacpp |
| 1 | +# docker-cudaml |
2 | 2 |
|
3 |
| -Repository which creates a llama.cpp server in a docker container, for amd64 and arm64, |
4 |
| -the latter of which is missing from the "official" repository. |
| 3 | +Repository which has some base images for running CUDA and cuDNN on Intel and ARM architectures. |
5 | 4 |
|
6 |
| -## Usage |
| 5 | +## CUDA Images |
7 | 6 |
|
8 |
| -If you want to use an NVIDIA GPU, then install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html) first. |
| 7 | +If you want to use an NVIDIA GPU, then install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html) first. You can use the following two images as the basis for your own images: |
9 | 8 |
|
10 |
| -You should put your `.gguf` model files in a directory called `/data`. Then use the following command |
11 |
| -to start the Llama server: |
12 |
| - |
13 |
| -```bash |
14 |
| -docker run \ |
15 |
| - --runtime nvidia --gpus all \ |
16 |
| - -v /data:/models -p 8080:8080 \ |
17 |
| - ghcr.io/mutablelogic/llamacpp-linux-arm64:0.0.3 \ |
18 |
| - --host 0.0.0.0 \ |
19 |
| - --model /models/mistral-7b-v0.1.Q4_K_M.gguf -ngl 32 --ctx-size 4096 --temp 0.7 --repeat_penalty 1.1 \ |
20 |
| - --in-prefix "<|im_start|>" --in-suffix "<|im_end|>" |
21 |
| -``` |
22 |
| - |
23 |
| -You can then access the Llama server on port 8080. |
24 |
| - |
25 |
| -## Building |
26 |
| - |
27 |
| -To build either the llama.cpp library or the onnxruntime library: |
28 |
| - |
29 |
| -```bash |
30 |
| -CUDA_HOME=/usr/local/cuda make llamacpp onnxruntime |
31 |
| -``` |
32 |
| - |
33 |
| -You can omit the CUDA_HOME environment variable if you don't want to build with CUDA support. |
34 |
| -The following will build a docker image and push to the repository: |
35 |
| - |
36 |
| -```bash |
37 |
| -git checkout git@github.com:mutablelogic/docker-llamacpp.git |
38 |
| -cd docker-llamacpp |
39 |
| -make docker && make docker-push |
40 |
| -``` |
41 |
| - |
42 |
| -Set the environment variable DOCKER_REGISTRY to the name of the registry to push to, e.g.: |
43 |
| - |
44 |
| -```bash |
45 |
| -git checkout git@github.com:mutablelogic/docker-llamacpp.git |
46 |
| -cd docker-llamacpp |
47 |
| -DOCKER_REGISTRY=docker.io/user make docker && make docker-push |
48 |
| -``` |
49 |
| - |
50 |
| -## Status |
51 |
| - |
52 |
| -Requires the ability to update the llama.cpp submodule to the master branch. |
53 |
| -Currently the github action uses a self-hosted runner to build the arm64 image. The runner |
54 |
| -seems to need about 12GB of memory to build the image. |
| 9 | +* `ghcr.io/mutablelogic/cuda-dev:1.0.2` - This image is based on Ubuntu 22.04 and includes the 12.6 CUDA toolkit and compiler build tools |
| 10 | +* `ghcr.io/mutablelogic/cuda-rt:1.0.2` - This image is based on Ubuntu 22.04 and includes the 12.6 CUDA runtime libraries. |
0 commit comments