Skip to content

Failing to build llamafile for Qwen2.5-Coder-7B-Instruct-Q4_K_M (with failed to load model error) #1

@bdutta

Description

@bdutta

Tried to build llamafile for the Qwen2.5-Coder-7B-Instruct-Q4_K_M.gguf model. Tried by both letting the script download the model file and work with it, and also pre-downloading the GGUF file for the model in ./model/ path then running the build script, but I keep getting the "llama_load_model_from_file: failed to load model" error, which might have something to do with the "[not a pkzip archive](warning: not a pkzip archive)" issue with the model file. The model is freshly downloaded one, done using Firefox browser, but same thing happens if I download model GGUF file using wget as well.

Note that I am on a freshly setup Ubuntu MATE 24.04.1 setup on a Intel i5-8440 (with IGPU) with 32GB DDR4 RAM (single channel) system.

chugnomug@mtkailash:~/Work/llamafile_chat$ ./build_file.sh
Please enter the model URL: https://huggingface.co/bartowski/Qwen2.5-Coder-7B-Instruct-GGUF/resolve/main/Qwen2.5-Coder-7B-Instruct-Q4_K_M.gguf
Model already exists.
Building Docker image...
[+] Building 33.6s (15/15) FINISHED docker:default
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 1.52kB 0.0s
=> resolve image config for docker-image://docker.io/docker/dockerfile:1 1.7s
=> CACHED docker-image://docker.io/docker/dockerfile:1@sha256:865e5dd094beca432e8c0a1d5e1c465db5f998dca4e439981029b3b81fb39ed5 0.0s
=> [internal] load metadata for docker.io/library/debian:bullseye-slim 1.6s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> CACHED [downloader 1/5] FROM docker.io/library/debian:bullseye-slim@sha256:610b4c7ad241e66f6e2f9791e3abdf0cc107a69238ab21bf9b4695d51fd6366a 0.0s
=> [final 2/5] RUN addgroup --gid 1000 user 0.4s
=> CACHED [downloader 2/5] WORKDIR /download 0.0s
=> [downloader 3/5] RUN apt-get update && apt-get install -y curl 8.9s
=> [final 3/5] RUN adduser --uid 1000 --gid 1000 --disabled-password --gecos "" user 0.3s
=> [final 4/5] WORKDIR /usr/src/app 0.1s
=> [downloader 4/5] RUN curl -L -o ./llamafile https://github.com/Mozilla-Ocho/llamafile/releases/download/0.8.16/llamafile-0.8.16 18.2s
=> [downloader 5/5] RUN chmod +x ./llamafile 0.9s
=> [final 5/5] COPY --from=downloader /download/llamafile ./llamafile 0.6s
=> exporting to image 0.7s
=> => exporting layers 0.7s
=> => writing image sha256:19790387eb6eab39840265d1d9489a50d0916a404c1c3039697fe4b470299704 0.0s
=> => naming to docker.io/library/llamafile_image 0.0s
Running Docker container...
Model filename: Qwen2.5-Coder-7B-Instruct-Q4_K_M.gguf
Current directory: /home/chugnomug/Work/llamafile_chat
Running Docker container with the following command:
docker run -p 8080:8080 -v "/home/chugnomug/Work/llamafile_chat/model:/usr/src/app/model" llamafile_image --server --host 0.0.0.0 -m "/usr/src/app/model/Qwen2.5-Coder-7B-Instruct-Q4_K_M.gguf"
note: if you have an AMD or NVIDIA GPU then you need to pass -ngl 9999 to enable GPU offloading
{"build":1500,"commit":"a30b324","function":"server_cli","level":"INFO","line":2898,"msg":"build info","tid":"11808320","timestamp":1731317430}
{"function":"server_cli","level":"INFO","line":2905,"msg":"system info","n_threads":6,"n_threads_batch":6,"system_info":"AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | ","tid":"11808320","timestamp":1731317430,"total_threads":6}
/usr/src/app/model/Qwen2.5-Coder-7B-Instruct-Q4_K_M.gguf: warning: not a pkzip archive
{"function":"load_model","level":"ERR","line":463,"model":"/usr/src/app/model/Qwen2.5-Coder-7B-Instruct-Q4_K_M.gguf","msg":"unable to load model","tid":"11808320","timestamp":1731317430}
llama_model_load: error loading model: failed to open /usr/src/app/model/Qwen2.5-Coder-7B-Instruct-Q4_K_M.gguf: Invalid argument
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model '/usr/src/app/model/Qwen2.5-Coder-7B-Instruct-Q4_K_M.gguf'

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions