Skip to content

server: add prompt processing progress streaming for /completion endpoint #14685 #14728

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

baonudesifeizhai
Copy link

#14685

  • Add server_task_result_cmpl_progress struct for streaming progress updates
  • Implement send_progress_response() function for real-time progress reporting
  • Send progress info during prompt processing phase before token generation
  • Support all compatibility modes (non-OAI, OAI completion, OAI chat)
  • Include detailed progress data: n_past, n_prompt_tokens, n_prompt_tokens_processed, progress percentage
  • Only send progress updates in streaming mode (stream: true)
  • Maintains backward compatibility with existing clients

Closes #14685

Make sure to read the contributing guidelines before submitting a PR

…oint

- Add server_task_result_cmpl_progress struct for streaming progress updates
- Implement send_progress_response() function for real-time progress reporting
- Send progress info during prompt processing phase before token generation
- Support all compatibility modes (non-OAI, OAI completion, OAI chat)
- Include detailed progress data: n_past, n_prompt_tokens, n_prompt_tokens_processed, progress percentage
- Only send progress updates in streaming mode (stream: true)
- Maintains backward compatibility with existing clients

Closes ggml-org#14685
Copy link
Collaborator

@ngxson ngxson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Things we need to consider:

  • This is not an official OAI spec, we should only send it if user explicitly request for that
  • Maybe reuse server_task_result_cmpl_partial instead of having to create a dedicated response type.

@ngxson
Copy link
Collaborator

ngxson commented Jul 16, 2025

  • Maintains backward compatibility with existing clients

No, this will breaks clients which assume the first response to be non-empty. prompt_processing is NOT an official OAI spec.

@BradHutchings
Copy link

As the feature requester, I would be happy with a flag in the request to send processing progress, default false.

@baonudesifeizhai
Copy link
Author

  • Maintains backward compatibility with existing clients

No, this will breaks clients which assume the first response to be non-empty. prompt_processing is NOT an official OAI spec.

On backward compatibility: I understand that some clients may assume the first response contains content. I will reuse the existing server_task_result_cmpl_partial structure and make sure the progress field is only included when explicitly requested. Does that work ?

@ngxson
Copy link
Collaborator

ngxson commented Jul 16, 2025

Yes, and a small line explaining how it works is appreciated. It should be added to the documentation at server/README.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature Request: Server stream response for "prompt processing progress"
3 participants