Skip to content

Commit 164eb76

Browse files
committed
Updated README
1 parent abc2cc1 commit 164eb76

File tree

1 file changed

+6
-50
lines changed

1 file changed

+6
-50
lines changed

README.md

Lines changed: 6 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -1,54 +1,10 @@
1-
# docker-llamacpp
1+
# docker-cudaml
22

3-
Repository which creates a llama.cpp server in a docker container, for amd64 and arm64,
4-
the latter of which is missing from the "official" repository.
3+
Repository which has some base images for running CUDA and cuDNN on Intel and ARM architectures.
54

6-
## Usage
5+
## CUDA Images
76

8-
If you want to use an NVIDIA GPU, then install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html) first.
7+
If you want to use an NVIDIA GPU, then install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html) first. You can use the following two images as the basis for your own images:
98

10-
You should put your `.gguf` model files in a directory called `/data`. Then use the following command
11-
to start the Llama server:
12-
13-
```bash
14-
docker run \
15-
--runtime nvidia --gpus all \
16-
-v /data:/models -p 8080:8080 \
17-
ghcr.io/mutablelogic/llamacpp-linux-arm64:0.0.3 \
18-
--host 0.0.0.0 \
19-
--model /models/mistral-7b-v0.1.Q4_K_M.gguf -ngl 32 --ctx-size 4096 --temp 0.7 --repeat_penalty 1.1 \
20-
--in-prefix "<|im_start|>" --in-suffix "<|im_end|>"
21-
```
22-
23-
You can then access the Llama server on port 8080.
24-
25-
## Building
26-
27-
To build either the llama.cpp library or the onnxruntime library:
28-
29-
```bash
30-
CUDA_HOME=/usr/local/cuda make llamacpp onnxruntime
31-
```
32-
33-
You can omit the CUDA_HOME environment variable if you don't want to build with CUDA support.
34-
The following will build a docker image and push to the repository:
35-
36-
```bash
37-
git checkout git@github.com:mutablelogic/docker-llamacpp.git
38-
cd docker-llamacpp
39-
make docker && make docker-push
40-
```
41-
42-
Set the environment variable DOCKER_REGISTRY to the name of the registry to push to, e.g.:
43-
44-
```bash
45-
git checkout git@github.com:mutablelogic/docker-llamacpp.git
46-
cd docker-llamacpp
47-
DOCKER_REGISTRY=docker.io/user make docker && make docker-push
48-
```
49-
50-
## Status
51-
52-
Requires the ability to update the llama.cpp submodule to the master branch.
53-
Currently the github action uses a self-hosted runner to build the arm64 image. The runner
54-
seems to need about 12GB of memory to build the image.
9+
* `ghcr.io/mutablelogic/cuda-dev:1.0.2` - This image is based on Ubuntu 22.04 and includes the 12.6 CUDA toolkit and compiler build tools
10+
* `ghcr.io/mutablelogic/cuda-rt:1.0.2` - This image is based on Ubuntu 22.04 and includes the 12.6 CUDA runtime libraries.

0 commit comments

Comments
 (0)