-
Notifications
You must be signed in to change notification settings - Fork 561
[Feature] mm and thinking model support structred output #2749
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
kevincheng2
commented
Jul 8, 2025
- mm and thinking model support structred output
- offline Inference support structred output
Thanks for your contribution! |
d07f737
to
72de4a3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds structured output support via guided decoding (reasoning parsers) for multi-modal and thinking models, including offline inference capabilities.
- Introduce a new
--reasoning_parser
CLI argument and propagate it through configuration to model runners. - Extend the sampling and guided decoding pipeline: updated
Sampler
, guided backend interfaces, and skip-index logic. - Enhance
SamplingParams
withGuidedDecodingParams
and document offline inference usage for structured outputs.
Reviewed Changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
fastdeploy/worker/worker_process.py | Add --reasoning_parser CLI arg and integrate it into FDConfig . |
fastdeploy/worker/vl_gpu_model_runner.py | Initialize guided backend and reasoning parser; update guided decoding flow in the GPU model runner. |
fastdeploy/model_executor/layers/sample/sampler.py | Enhance Sampler to support reasoning parsing and skip indices when masking tokens. |
fastdeploy/engine/sampling_params.py | Introduce GuidedDecodingParams in SamplingParams for offline structured inference. |
docs/features/structured_outputs.md | Add offline inference examples for structured output using GuidedDecodingParams . |
Comments suppressed due to low confidence (3)
fastdeploy/worker/vl_gpu_model_runner.py:145
- The code checks for
guided_json
,guided_regex
,guided_grammar
, andstructural_tag
but does not handleguided_choice
fromGuidedDecodingParams
. Add support forguided_choice
to ensure all constraint types are honored.
elif request.guided_grammar is not None:
fastdeploy/engine/engine.py:1049
- The code references
self.cfg.reasoning_parser
, butreasoning_parser
is not defined on the engine config object. It should likely referenceself.cfg.model_config.reasoning_parser
.
f" --reasoning_parser {self.cfg.reasoning_parser}")
fastdeploy/worker/vl_gpu_model_runner.py:152
- Using
request.get(...)
may not work ifrequest
is not a dict-like object. Consider usinggetattr(request, 'enable_thinking', True)
to access the attribute safely.
enable_thinking=request.get("enable_thinking", True),
@@ -300,6 +324,10 @@ def __init__(self, fd_config: FDConfig): | |||
self.speculative_max_candidate_len = fd_config.speculative_config.max_candidate_len | |||
self.speculative_benchmark_mode = fd_config.speculative_config.benchmark_mode | |||
|
|||
def set_reasoning_parser(self, reasoning_parser: Optional[ReasoningParser] = None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The SpeculativeSampler
overrides set_reasoning_parser
(and other guided methods) as empty stubs. Consider implementing or explicitly documenting why guided decoding is unsupported to avoid silent no-ops.
Copilot uses AI. Check for mistakes.
@@ -100,7 +100,7 @@ def process_request(self, request, max_model_len=None, **kwargs): | |||
|
|||
if request.prompt_token_ids is None or len( | |||
request.prompt_token_ids) == 0: | |||
system = request.get("system") | |||
# system = request.get("system") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] This commented-out dead code can be removed to clean up the implementation and avoid confusion.
# system = request.get("system") | |
Copilot uses AI. Check for mistakes.