Skip to content

check FIM response for expected format #73

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 3, 2025
Merged

Conversation

pnb
Copy link
Contributor

@pnb pnb commented Jul 2, 2025

Partially fixes #61 by handling errors gracefully (though it will not help users discover that their entire server is set up incorrectly). Also does not explain why llama-server sometimes has sequence 0 does not start from the last position stored in the memory errors. But, it does handle unexpected server responses gracefully.

This approach handles unexpected issues upon server response in fim_on_response, avoiding taking up slots in the cache with invalid responses. You can test it with a couple of examples by entering these commands:

Not valid JSON (endpoint returns HTML)

:let g:llama_config['endpoint'] = 'https://example.com'

Valid JSON missing the content key

:let g:llama_config['endpoint'] = 'https://dummyjson.com/posts/add'

It might be overkill to check the JSON string before decoding, I was just worried about small performance hits from unnecessarily doing full JSON decodes, especially since this can happen on every keypress if the responses are invalid, since there will never be cache hits.

@ggerganov
Copy link
Member

Also does not explain why llama-server sometimes has sequence 0 does not start from the last position stored in the memory errors.

Which version of llama-server and model are you using when you get this error?

@ggerganov
Copy link
Member

ggerganov commented Jul 3, 2025

Also does not explain why llama-server sometimes has sequence 0 does not start from the last position stored in the memory errors.

Which version of llama-server and model are you using when you get this error?

FYI, these 2 changes from last week should have fixed the problem:

If you still spot the error with a build that includes these fixes, let me know.

@pnb
Copy link
Contributor Author

pnb commented Jul 3, 2025

Now that you mention it, I haven't seen that "sequence 0" error yet this week. I update the server almost daily, so that probably explains it. I was using Qwen 2.5 14B coder, by the way.

However, this PR wasn't designed to fix only that issue specifically, but more generally handle the case of unexpected server responses. I still get those almost daily when running over a network, as my home wifi is OK but not amazing. So it would still be useful to merge this, I think.

@ggerganov ggerganov merged commit f886bad into ggml-org:master Jul 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Key not present in Dictionary: "content"
2 participants