Skip to content

Llama 4 not working #1994

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Kenshiro-28 opened this issue Apr 8, 2025 · 8 comments
Open

Llama 4 not working #1994

Kenshiro-28 opened this issue Apr 8, 2025 · 8 comments

Comments

@Kenshiro-28
Copy link

llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'llama4'
llama_model_load_from_file_impl: failed to load model

Please update to a newer version of llama.cpp:

https://github.com/ggml-org/llama.cpp/releases/tag/b5074

@JamePeng
Copy link

JamePeng commented Apr 8, 2025

My fork project has added some updates of llama4: https://github.com/JamePeng/llama-cpp-python

@kerlion
Copy link

kerlion commented Apr 14, 2025

Same issue, how to run llama4?

@AleefBilal
Copy link

@kerlion
What version of llama-cpp-python are you using?
Can you also give me some inside about your platform (OS, etc).

@kerlion
Copy link

kerlion commented Apr 17, 2025

@kerlion What version of llama-cpp-python are you using? Can you also give me some inside about your platform (OS, etc).

image: nvidia/cuda:12.2.0-runtime-ubuntu22.04
llama_cpp_python 0.3.8

@kerlion
Copy link

kerlion commented Apr 17, 2025

I compiled it from the source code, passed this error. But I do not know which "chat_format" to use?
Llama-4-Scout-17B-16E-Instruct-UD-Q2_K_XL

@AleefBilal
Copy link

AleefBilal commented Apr 17, 2025

@kerlion
Great job on compiling it from source. Below is the command that might save you from the struggle of source compiling.
CMAKE_ARGS="-DGGML_CUDA=ON -DLLAMA_LLAVA=OFF" pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir
Furthermore, i wasn't able to quite understand your message about using which "chat_format", can you please elaborate.

@h-haghpanah
Copy link

same error with llama_cpp_python 0.3.8:

print_info: file format = GGUF V3 (latest)
print_info: file type = Q4_K - Medium
print_info: file size = 62.90 GiB (5.01 BPW)
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'llama4'
llama_model_load_from_file_impl: failed to load model

@perronemirko
Copy link

My fork project has added some updates of llama4: https://github.com/JamePeng/llama-cpp-python

Could you please provide your commit number ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants