GitHub - kth8/llama-server-vulkan: Run llama.cpp server with Vulkan

This repository provides a ready-to-use container image with the llama.cpp server server compiled with Vulkan support and without AVX requirement to run on old hardware. Tested using i3-3220 CPU with RX 470 GPU.

docker container run \
  --detach \
  --init \
  --restart always \
  --device /dev/kfd \
  --device /dev/dri \
  --publish 8001:8080 \
  --volume llama:/root/.cache \
  --label io.containers.autoupdate=registry \
  --pull newer \
  --name llama-vulkan \
  --env LLAMA_ARG_HF_REPO=ggml-org/Qwen3-1.7B-GGUF \
  ghcr.io/kth8/llama-server-vulkan:latest

Replace ggml-org/Qwen3-1.7B-GGUF with your model of choice from Hugging Face. Verify if the server is running by going to http://127.0.0.1:8001 in your web browser or using the terminal:

curl -X POST http://127.0.0.1:8001/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"messages":[{"role":"user","content":"Hello"}]}'

To load an existing GGUF you can mount that into the container and set the LLAMA_ARG_MODEL environment variable to the model file name, by replacing --env LLAMA_ARG_HF_REPO=ggml-org/Qwen3-1.7B-GGUF with for example

--volume ~/Downloads/DeepSeek-R1-Distill-Qwen-7B-Q4_K_M.gguf:/DeepSeek-R1-Distill-Qwen-7B-Q4_K_M.gguf:z \
--env LLAMA_ARG_MODEL=DeepSeek-R1-Distill-Qwen-7B-Q4_K_M.gguf

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.github/workflows		.github/workflows
Containerfile		Containerfile
LICENSE		LICENSE
README.md		README.md
compose.yml		compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Languages

License

kth8/llama-server-vulkan

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Languages

Packages