Skip to content

[Feature] mm and thinking model support structred output #2749

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: develop
Choose a base branch
from

Conversation

kevincheng2
Copy link
Contributor

  1. mm and thinking model support structred output
  2. offline Inference support structred output

Copy link

paddle-bot bot commented Jul 8, 2025

Thanks for your contribution!

@kevincheng2 kevincheng2 changed the title [vl] mm and thinking model support structred output [Feature] mm and thinking model support structred output Jul 8, 2025
Copilot

This comment was marked as outdated.

@kevincheng2 kevincheng2 force-pushed the mm_structred_output branch from d07f737 to 72de4a3 Compare July 11, 2025 06:41
@Jiang-Jia-Jun Jiang-Jia-Jun requested a review from Copilot July 12, 2025 16:08
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds structured output support via guided decoding (reasoning parsers) for multi-modal and thinking models, including offline inference capabilities.

  • Introduce a new --reasoning_parser CLI argument and propagate it through configuration to model runners.
  • Extend the sampling and guided decoding pipeline: updated Sampler, guided backend interfaces, and skip-index logic.
  • Enhance SamplingParams with GuidedDecodingParams and document offline inference usage for structured outputs.

Reviewed Changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
fastdeploy/worker/worker_process.py Add --reasoning_parser CLI arg and integrate it into FDConfig.
fastdeploy/worker/vl_gpu_model_runner.py Initialize guided backend and reasoning parser; update guided decoding flow in the GPU model runner.
fastdeploy/model_executor/layers/sample/sampler.py Enhance Sampler to support reasoning parsing and skip indices when masking tokens.
fastdeploy/engine/sampling_params.py Introduce GuidedDecodingParams in SamplingParams for offline structured inference.
docs/features/structured_outputs.md Add offline inference examples for structured output using GuidedDecodingParams.
Comments suppressed due to low confidence (3)

fastdeploy/worker/vl_gpu_model_runner.py:145

  • The code checks for guided_json, guided_regex, guided_grammar, and structural_tag but does not handle guided_choice from GuidedDecodingParams. Add support for guided_choice to ensure all constraint types are honored.
        elif request.guided_grammar is not None:

fastdeploy/engine/engine.py:1049

  • The code references self.cfg.reasoning_parser, but reasoning_parser is not defined on the engine config object. It should likely reference self.cfg.model_config.reasoning_parser.
            f" --reasoning_parser {self.cfg.reasoning_parser}")

fastdeploy/worker/vl_gpu_model_runner.py:152

  • Using request.get(...) may not work if request is not a dict-like object. Consider using getattr(request, 'enable_thinking', True) to access the attribute safely.
            enable_thinking=request.get("enable_thinking", True),

@@ -300,6 +324,10 @@ def __init__(self, fd_config: FDConfig):
self.speculative_max_candidate_len = fd_config.speculative_config.max_candidate_len
self.speculative_benchmark_mode = fd_config.speculative_config.benchmark_mode

def set_reasoning_parser(self, reasoning_parser: Optional[ReasoningParser] = None):
Copy link
Preview

Copilot AI Jul 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The SpeculativeSampler overrides set_reasoning_parser (and other guided methods) as empty stubs. Consider implementing or explicitly documenting why guided decoding is unsupported to avoid silent no-ops.

Copilot uses AI. Check for mistakes.

@@ -100,7 +100,7 @@ def process_request(self, request, max_model_len=None, **kwargs):

if request.prompt_token_ids is None or len(
request.prompt_token_ids) == 0:
system = request.get("system")
# system = request.get("system")
Copy link
Preview

Copilot AI Jul 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] This commented-out dead code can be removed to clean up the implementation and avoid confusion.

Suggested change
# system = request.get("system")

Copilot uses AI. Check for mistakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants