-
Notifications
You must be signed in to change notification settings - Fork 5.7k
Description
Describe the bug
I am encountering a persistent llama.cpp server error (exit code: -4) when attempting to load GGUF models with text-generation-webui running in a Docker container. My setup is a roughly 10-year-old Dell workstation with a CPU that lacks AVX2 instruction support. Even the atinoda/text-generation-webui:default-nvidia-noavx2 Docker image results in this error, indicating that the llama.cpp binary within it may still rely on CPU instructions not supported by my older processor.
Is there an existing issue for this?
- I have searched the existing issues
Reproduction
Pulled and ran atinoda/text-generation-webui:default-cpu or atinoda/text-generation-webui:default-nvidia-noavx2 via Docker.
Ensured user_data/models is mounted to store the GGUF model.
Accessed the web UI and attempted to load tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf.
Screenshot
No response
Logs
When attempting to load a GGUF model, the llama-server process terminates unexpectedly. The relevant log snippets from the Docker container output are:
07:24:12-283821 INFO Loading "tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf"
07:24:12-301826 INFO llama-server command-line flags:
--model user_data/models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf --ctx-size 2048 --gpu-layers 0 --batch-size 256 --port 33391 --no-webui --threads 1
07:24:12-303710 INFO Using gpu_layers=0 | ctx_size=2048 | cache_type=fp16
07:24:13-311473 ERROR Error loading the model with llama.cpp: Server process
terminated unexpectedly with exit code: -4
System Info
System Info:
Host OS: Proxmox (running an Ubuntu 22.04 LTS container)
Docker Management: Portainer
WebUI Images Tried: atinoda/text-generation-webui:default-cpu and atinoda/text-generation-webui:default-nvidia-noavx2
CPU: Intel(R) Xeon(R) CPU E5-2603 0 @ 1.80GHz (This is an older CPU, likely lacking AVX2, F16C, and FMA support.)
GPU: Advanced Micro Devices, Inc. [AMD/ATI] Barts XT [Radeon HD 6870]
RAM: Hundreds of GBs
Model: tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf
Notes: The gpu-layers setting is correctly set to 0 in the llama-server flags, confirming the issue is CPU-related. This problem occurs even with a small model like tinyllama-1.1b.