Skip to content

llama.cpp Server exit code: -4 on older CPU (No AVX2, running in Docker) #7162

@Kreakdude

Description

@Kreakdude

Describe the bug

I am encountering a persistent llama.cpp server error (exit code: -4) when attempting to load GGUF models with text-generation-webui running in a Docker container. My setup is a roughly 10-year-old Dell workstation with a CPU that lacks AVX2 instruction support. Even the atinoda/text-generation-webui:default-nvidia-noavx2 Docker image results in this error, indicating that the llama.cpp binary within it may still rely on CPU instructions not supported by my older processor.

Is there an existing issue for this?

  • I have searched the existing issues

Reproduction

Pulled and ran atinoda/text-generation-webui:default-cpu or atinoda/text-generation-webui:default-nvidia-noavx2 via Docker.

Ensured user_data/models is mounted to store the GGUF model.

Accessed the web UI and attempted to load tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf.

Screenshot

No response

Logs

When attempting to load a GGUF model, the llama-server process terminates unexpectedly. The relevant log snippets from the Docker container output are:

07:24:12-283821 INFO     Loading "tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf"         
07:24:12-301826 INFO     llama-server command-line flags:                       
--model user_data/models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf --ctx-size 2048 --gpu-layers 0 --batch-size 256 --port 33391 --no-webui --threads 1
07:24:12-303710 INFO     Using gpu_layers=0 | ctx_size=2048 | cache_type=fp16   
07:24:13-311473 ERROR    Error loading the model with llama.cpp: Server process 
                         terminated unexpectedly with exit code: -4

System Info

System Info:

Host OS: Proxmox (running an Ubuntu 22.04 LTS container)

Docker Management: Portainer

WebUI Images Tried: atinoda/text-generation-webui:default-cpu and atinoda/text-generation-webui:default-nvidia-noavx2

CPU: Intel(R) Xeon(R) CPU E5-2603 0 @ 1.80GHz (This is an older CPU, likely lacking AVX2, F16C, and FMA support.)

GPU: Advanced Micro Devices, Inc. [AMD/ATI] Barts XT [Radeon HD 6870]

RAM: Hundreds of GBs

Model: tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf

Notes: The gpu-layers setting is correctly set to 0 in the llama-server flags, confirming the issue is CPU-related. This problem occurs even with a small model like tinyllama-1.1b.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions