-
Notifications
You must be signed in to change notification settings - Fork 21
Description
Tried to build llamafile for the Qwen2.5-Coder-7B-Instruct-Q4_K_M.gguf model. Tried by both letting the script download the model file and work with it, and also pre-downloading the GGUF file for the model in ./model/ path then running the build script, but I keep getting the "llama_load_model_from_file: failed to load model" error, which might have something to do with the "[not a pkzip archive](warning: not a pkzip archive)" issue with the model file. The model is freshly downloaded one, done using Firefox browser, but same thing happens if I download model GGUF file using wget as well.
Note that I am on a freshly setup Ubuntu MATE 24.04.1 setup on a Intel i5-8440 (with IGPU) with 32GB DDR4 RAM (single channel) system.
chugnomug@mtkailash:~/Work/llamafile_chat$ ./build_file.sh
Please enter the model URL: https://huggingface.co/bartowski/Qwen2.5-Coder-7B-Instruct-GGUF/resolve/main/Qwen2.5-Coder-7B-Instruct-Q4_K_M.gguf
Model already exists.
Building Docker image...
[+] Building 33.6s (15/15) FINISHED docker:default
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 1.52kB 0.0s
=> resolve image config for docker-image://docker.io/docker/dockerfile:1 1.7s
=> CACHED docker-image://docker.io/docker/dockerfile:1@sha256:865e5dd094beca432e8c0a1d5e1c465db5f998dca4e439981029b3b81fb39ed5 0.0s
=> [internal] load metadata for docker.io/library/debian:bullseye-slim 1.6s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> CACHED [downloader 1/5] FROM docker.io/library/debian:bullseye-slim@sha256:610b4c7ad241e66f6e2f9791e3abdf0cc107a69238ab21bf9b4695d51fd6366a 0.0s
=> [final 2/5] RUN addgroup --gid 1000 user 0.4s
=> CACHED [downloader 2/5] WORKDIR /download 0.0s
=> [downloader 3/5] RUN apt-get update && apt-get install -y curl 8.9s
=> [final 3/5] RUN adduser --uid 1000 --gid 1000 --disabled-password --gecos "" user 0.3s
=> [final 4/5] WORKDIR /usr/src/app 0.1s
=> [downloader 4/5] RUN curl -L -o ./llamafile https://github.com/Mozilla-Ocho/llamafile/releases/download/0.8.16/llamafile-0.8.16 18.2s
=> [downloader 5/5] RUN chmod +x ./llamafile 0.9s
=> [final 5/5] COPY --from=downloader /download/llamafile ./llamafile 0.6s
=> exporting to image 0.7s
=> => exporting layers 0.7s
=> => writing image sha256:19790387eb6eab39840265d1d9489a50d0916a404c1c3039697fe4b470299704 0.0s
=> => naming to docker.io/library/llamafile_image 0.0s
Running Docker container...
Model filename: Qwen2.5-Coder-7B-Instruct-Q4_K_M.gguf
Current directory: /home/chugnomug/Work/llamafile_chat
Running Docker container with the following command:
docker run -p 8080:8080 -v "/home/chugnomug/Work/llamafile_chat/model:/usr/src/app/model" llamafile_image --server --host 0.0.0.0 -m "/usr/src/app/model/Qwen2.5-Coder-7B-Instruct-Q4_K_M.gguf"
note: if you have an AMD or NVIDIA GPU then you need to pass -ngl 9999 to enable GPU offloading
{"build":1500,"commit":"a30b324","function":"server_cli","level":"INFO","line":2898,"msg":"build info","tid":"11808320","timestamp":1731317430}
{"function":"server_cli","level":"INFO","line":2905,"msg":"system info","n_threads":6,"n_threads_batch":6,"system_info":"AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | ","tid":"11808320","timestamp":1731317430,"total_threads":6}
/usr/src/app/model/Qwen2.5-Coder-7B-Instruct-Q4_K_M.gguf: warning: not a pkzip archive
{"function":"load_model","level":"ERR","line":463,"model":"/usr/src/app/model/Qwen2.5-Coder-7B-Instruct-Q4_K_M.gguf","msg":"unable to load model","tid":"11808320","timestamp":1731317430}
llama_model_load: error loading model: failed to open /usr/src/app/model/Qwen2.5-Coder-7B-Instruct-Q4_K_M.gguf: Invalid argument
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model '/usr/src/app/model/Qwen2.5-Coder-7B-Instruct-Q4_K_M.gguf'