[Feature] mm and thinking model support structred output #2749

kevincheng2 · 2025-07-08T07:50:42Z

mm and thinking model support structred output
offline Inference support structred output

paddle-bot · 2025-07-08T07:50:48Z

Thanks for your contribution!

Copilot

Pull Request Overview

This PR adds structured output support via guided decoding (reasoning parsers) for multi-modal and thinking models, including offline inference capabilities.

Introduce a new --reasoning_parser CLI argument and propagate it through configuration to model runners.
Extend the sampling and guided decoding pipeline: updated Sampler, guided backend interfaces, and skip-index logic.
Enhance SamplingParams with GuidedDecodingParams and document offline inference usage for structured outputs.

Reviewed Changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
fastdeploy/worker/worker_process.py	Add `--reasoning_parser` CLI arg and integrate it into `FDConfig`.
fastdeploy/worker/vl_gpu_model_runner.py	Initialize guided backend and reasoning parser; update guided decoding flow in the GPU model runner.
fastdeploy/model_executor/layers/sample/sampler.py	Enhance `Sampler` to support reasoning parsing and skip indices when masking tokens.
fastdeploy/engine/sampling_params.py	Introduce `GuidedDecodingParams` in `SamplingParams` for offline structured inference.
docs/features/structured_outputs.md	Add offline inference examples for structured output using `GuidedDecodingParams`.

Comments suppressed due to low confidence (3)

fastdeploy/worker/vl_gpu_model_runner.py:145

The code checks for guided_json, guided_regex, guided_grammar, and structural_tag but does not handle guided_choice from GuidedDecodingParams. Add support for guided_choice to ensure all constraint types are honored.

        elif request.guided_grammar is not None:

fastdeploy/engine/engine.py:1049

The code references self.cfg.reasoning_parser, but reasoning_parser is not defined on the engine config object. It should likely reference self.cfg.model_config.reasoning_parser.

            f" --reasoning_parser {self.cfg.reasoning_parser}")

fastdeploy/worker/vl_gpu_model_runner.py:152

Using request.get(...) may not work if request is not a dict-like object. Consider using getattr(request, 'enable_thinking', True) to access the attribute safely.

            enable_thinking=request.get("enable_thinking", True),

Copilot · 2025-07-12T16:10:34Z

fastdeploy/model_executor/layers/sample/sampler.py

@@ -300,6 +324,10 @@ def __init__(self, fd_config: FDConfig):
        self.speculative_max_candidate_len = fd_config.speculative_config.max_candidate_len
        self.speculative_benchmark_mode = fd_config.speculative_config.benchmark_mode

+    def set_reasoning_parser(self, reasoning_parser: Optional[ReasoningParser] = None):


[nitpick] The SpeculativeSampler overrides set_reasoning_parser (and other guided methods) as empty stubs. Consider implementing or explicitly documenting why guided decoding is unsupported to avoid silent no-ops.

Copilot · 2025-07-12T16:10:34Z

fastdeploy/input/ernie_processor.py

@@ -100,7 +100,7 @@ def process_request(self, request, max_model_len=None, **kwargs):

        if request.prompt_token_ids is None or len(
                request.prompt_token_ids) == 0:
-            system = request.get("system")
+            # system = request.get("system")


[nitpick] This commented-out dead code can be removed to clean up the implementation and avoid confusion.

Suggested change

# system = request.get("system")

kevincheng2 changed the title ~~[vl] mm and thinking model support structred output~~ [Feature] mm and thinking model support structred output Jul 8, 2025

Jiang-Jia-Jun requested review from Copilot and Jiang-Jia-Jun July 9, 2025 04:22

This comment was marked as outdated.

Sign in to view

kevincheng2 added 4 commits July 11, 2025 14:34

mm support structured output

27a001c

update code

a0293e9

update docs

c7479b1

update code

72de4a3

kevincheng2 force-pushed the mm_structred_output branch from d07f737 to 72de4a3 Compare July 11, 2025 06:41

kevincheng2 and others added 2 commits July 11, 2025 14:48

update code

205b6d8

Merge branch 'develop' into mm_structred_output

e99d5a7

Jiang-Jia-Jun requested a review from Copilot July 12, 2025 16:08

Copilot AI reviewed Jul 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] mm and thinking model support structred output #2749

[Feature] mm and thinking model support structred output #2749

Uh oh!

kevincheng2 commented Jul 8, 2025

Uh oh!

paddle-bot bot commented Jul 8, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jul 12, 2025

Uh oh!

Copilot AI Jul 12, 2025

Uh oh!

Uh oh!

[Feature] mm and thinking model support structred output #2749

Are you sure you want to change the base?

[Feature] mm and thinking model support structred output #2749

Uh oh!

Conversation

kevincheng2 commented Jul 8, 2025

Uh oh!

paddle-bot bot commented Jul 8, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Jul 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!