Apparent neverending response to simple prompt #12919

mcondarelli · 2025-04-12T14:08:50Z

mcondarelli
Apr 12, 2025

This looks like a bug, but it could also be some bad setting on my side (or even a model bug, but that would be very strange)
Please advise.

I compiled from latest git (commit: 8b9cc7c) using the following options:

export HIPCXX="$(hipconfig -l)/clang"
export HIP_PATH="$(hipconfig -R)"
cmake -S . -B build -DGGML_HIP=ON -DAMDGPU_TARGETS=gfx1100 -DCMAKE_BUILD_TYPE=Release
cmake --build build --config Release -- -j 32

It compiles without apparent errors.

I started llama-server as (model was downloaded from HuggingFace):

[conversation_conv-1744465379214.json](https://github.com/user-attachments/files/19719927/conversation_conv-1744465379214.json)

/home/mcon/__AI__/llama.cpp/build/bin/llama-server -m /home/mcon/__AI__/models/deepseek-coder-6.7b-instruct.Q4_K_M.gguf --host 0.0.0.0 --port 9070 --gpu-layers 45 --threads 12 --batch-size 512 --verbose --ctx-size 8192 --verbose-prompt

I connected to web server (using Firefox, if relevant) and sent a test question: "in modern python: how to start a subprocess in daemon mode, redirecting output to file and saving pid to file?"

Model actually answered the question correctly, but it didn't stop and started to prattle between itself until I hit the "Stop" button.

I attach the actual output: conversation_conv-1744465379214.json.

I also attach the log where I see some worrying comments I don't know how to interpret: llama-server.log

It seems there's some mismatch between prompt format expected by model and what is actually sent by llama-server

acbits · 2025-04-12T20:42:29Z

acbits
Apr 12, 2025

When using the web interface, you have to set the model parameters by clicking on the settings icon at the top right corner.

For deepseek, the temperature should be set to 0.6. Each model comes with its unique set of parameters for optimum performance.

Also, even if you set the parameters, reasoning models take a long time(I have seen 70 minutes for the response to finish) depending on your hardware.

1 reply

mcondarelli Apr 13, 2025
Author

I don't think this is the root problem.
In order to have sensible answers I had to modify prompt to something like:

{%- for message in messages %}
### {% if message.role == "user" %}Instruction{% elif message.role == "assistant" %}Response{% endif %}:
{{ message.content }}
{%- endfor %}
### Response:

and use the --jinja option.

Full (apparently) working command line I used is:

llama-server -m models/deepseek-coder-6.7b-instruct.Q4_K_M.gguf --port 9070 --host 0.0.0.0 --threads 12 --batch-size 512 --verbose-prompt --jinja --ctx_size 8192 -ngl 45 --chat-template "{%- for message in messages %}
### {% if message.role == "user" %}Instruction{% elif message.role == "assistant" %}Response{% endif %}:
{{ message.content }}
{%- endfor %}
### Response:"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Apparent neverending response to simple prompt #12919

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Apparent neverending response to simple prompt #12919

Uh oh!

mcondarelli Apr 12, 2025

Replies: 1 comment · 1 reply

Uh oh!

acbits Apr 12, 2025

Uh oh!

mcondarelli Apr 13, 2025 Author

mcondarelli
Apr 12, 2025

Replies: 1 comment 1 reply

acbits
Apr 12, 2025

mcondarelli Apr 13, 2025
Author