[Structured Outputs][V1] Skipping with models doesn't contain tokenizers #20365

aarnphm · 2025-07-02T09:06:48Z

This PR address issues that #20072 runs into where the models doesn't provide a tokenizer. In this case, we shouldn't allow structured outputs to be used with these models.

I added check to make sure that we won't interfere with these workflows.

Signed-off-by: Aaron Pham contact@aarnphm.xyz

Signed-off-by: Aaron Pham <contact@aarnphm.xyz>

gemini-code-assist

Summary of Changes

Hello @aarnphm, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical compatibility issue by preventing the use of structured output features with models that are initialized without a tokenizer. Since structured outputs fundamentally rely on tokenizer functionality for grammar parsing and constraint application, these changes introduce explicit checks to disallow such incompatible configurations. The PR ensures that if a model is configured to skip tokenizer initialization, any attempt to use structured outputs will be correctly identified and prevented, leading to more predictable and stable system behavior.

Highlights

Prevent Incompatible Structured Outputs: Implemented robust checks to disallow structured output usage when the model is initialized with skip_tokenizer_init=True. This raises a ValueError to explicitly indicate that structured outputs, which rely on tokenizer functionality, are not supported in such configurations.
Conditional Tokenizer Initialization: The StructuredOutputManager has been refactored to conditionally initialize tokenizer-dependent components (such as the grammar parser and its associated thread pool) only when a tokenizer is available. This optimizes resource usage and prevents errors in environments where tokenizer initialization is intentionally skipped.
Enhanced Test Coverage: New unit tests have been added to test_scheduler.py to specifically validate the correct behavior of structured output requests when skip_tokenizer_init is enabled. These tests ensure that such requests are appropriately rejected or not scheduled, maintaining system stability.
Codebase Refinements: Minor refactorings have been applied across the codebase, including updating StructuredOutputManager instantiation to consistently use keyword arguments and adjusting type-checking imports for improved code clarity and maintainability.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request effectively prevents the use of structured outputs when a tokenizer is not available by adding checks at both the processing and scheduling layers. The changes are well-contained, and the addition of tests is much appreciated.

vllm/v1/structured_output/__init__.py

vllm/v1/engine/processor.py

github-actions · 2025-07-02T09:19:20Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

njhill

Thanks for adding this @aarnphm.

I think we should add or improve another generic test. There is https://github.com/vllm-project/vllm/blob/main/tests/engine/test_options.py#L12 but either this isn't running on V1 or it's not properly testing the functionality.

It looks like at least the latter is true since it uses distilbert/distilgpt2 which has a tokenizer. The test should use a model without a tokenizer (or could download/copy the model to temp dir and delete the tokenizer first).

vllm/v1/structured_output/__init__.py

vllm/v1/engine/processor.py

vllm/v1/structured_output/__init__.py

aarnphm · 2025-07-02T11:22:10Z

I think we should add or improve another generic test. There is https://github.com/vllm-project/vllm/blob/main/tests/engine/test_options.py#L12 but either this isn't running on V1 or it's not properly testing the functionality.

It looks like at least the latter is true since it uses distilbert/distilgpt2 which has a tokenizer. The test should use a model without a tokenizer (or could download/copy the model to temp dir and delete the tokenizer first).

I think I will create a tests for this in V1, probably better for longevity.

Co-authored-by: Nick Hill <nhill@redhat.com> Signed-off-by: Aaron Pham <contact@aarnphm.xyz>

Signed-off-by: Aaron Pham <contact@aarnphm.xyz>

aarnphm · 2025-07-03T11:49:14Z

cc @russellb @njhill when you have bandwidth

christian-pinto · 2025-07-03T13:00:26Z

Tested with the changes in #20072 And it works fine. Thanks!

…ers (vllm-project#20365) Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Co-authored-by: Nick Hill <nhill@redhat.com>

chore(so): support skip_tokenizer_init

8a99739

Signed-off-by: Aaron Pham <contact@aarnphm.xyz>

aarnphm requested review from mgoin, russellb, WoosukKwon, robertgshaw2-redhat, njhill, ywang96, comaniac and alexm-redhat as code owners July 2, 2025 09:06

aarnphm mentioned this pull request Jul 2, 2025

[Core][Model] PrithviMAE Enablement on vLLM v1 engine (superseded by PR 20577) #20072

Closed

mergify bot added the structured-output label Jul 2, 2025

gemini-code-assist bot reviewed Jul 2, 2025

View reviewed changes

mergify bot added the v1 label Jul 2, 2025

github-project-automation bot added this to Structured Output Jul 2, 2025

aarnphm added the ready ONLY add when PR is ready to merge/full CI is needed label Jul 2, 2025

gemini-code-assist bot reviewed Jul 2, 2025

View reviewed changes

vllm/v1/structured_output/__init__.py Outdated Show resolved Hide resolved

vllm/v1/structured_output/__init__.py Outdated Show resolved Hide resolved

vllm/v1/engine/processor.py Outdated Show resolved Hide resolved

njhill reviewed Jul 2, 2025

View reviewed changes

vllm/v1/structured_output/__init__.py Outdated Show resolved Hide resolved

vllm/v1/engine/processor.py Outdated Show resolved Hide resolved

vllm/v1/structured_output/__init__.py Outdated Show resolved Hide resolved

aarnphm and others added 3 commits July 2, 2025 07:26

revert: remove dataclass initialization and update warning messages

45f68a8

Co-authored-by: Nick Hill <nhill@redhat.com> Signed-off-by: Aaron Pham <contact@aarnphm.xyz>

chore: address Nick's comments and add tests for v1

a49f180

Signed-off-by: Aaron Pham <contact@aarnphm.xyz>

revert: remove misc changes

7d2ad08

Signed-off-by: Aaron Pham <contact@aarnphm.xyz>

aarnphm force-pushed the chore/support-skip-tokenizer-init branch from 176161a to 7d2ad08 Compare July 2, 2025 12:13

aarnphm added 2 commits July 2, 2025 08:14

perf(test): improve test time by removing duplicates

31e4060

Signed-off-by: Aaron Pham <contact@aarnphm.xyz>

revert: remove misc changes

700a4b2

Signed-off-by: Aaron Pham <contact@aarnphm.xyz>

aarnphm added ready ONLY add when PR is ready to merge/full CI is needed and removed ready ONLY add when PR is ready to merge/full CI is needed labels Jul 2, 2025

aarnphm requested a review from njhill July 2, 2025 19:46

chore: fix precommit

8b4df50

Signed-off-by: Aaron Pham <contact@aarnphm.xyz>

DarkLight1337 approved these changes Jul 4, 2025

View reviewed changes

DarkLight1337 merged commit 4a98edf into vllm-project:main Jul 4, 2025
71 checks passed

github-project-automation bot moved this to Done in Structured Output Jul 4, 2025

aarnphm deleted the chore/support-skip-tokenizer-init branch July 4, 2025 14:47

sfeng33 pushed a commit to sfeng33/vllm that referenced this pull request Jul 6, 2025

[Structured Outputs][V1] Skipping with models doesn't contain tokeniz…

7f3e98a

…ers (vllm-project#20365) Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Co-authored-by: Nick Hill <nhill@redhat.com>

huydhn pushed a commit to huydhn/vllm that referenced this pull request Jul 8, 2025

[Structured Outputs][V1] Skipping with models doesn't contain tokeniz…

3cf5072

…ers (vllm-project#20365) Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Co-authored-by: Nick Hill <nhill@redhat.com>

Chen-zexi pushed a commit to Chen-zexi/vllm that referenced this pull request Jul 13, 2025

[Structured Outputs][V1] Skipping with models doesn't contain tokeniz…

bbab933

…ers (vllm-project#20365) Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Co-authored-by: Nick Hill <nhill@redhat.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Structured Outputs][V1] Skipping with models doesn't contain tokenizers #20365

[Structured Outputs][V1] Skipping with models doesn't contain tokenizers #20365

Uh oh!

aarnphm commented Jul 2, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Jul 2, 2025

Uh oh!

njhill left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aarnphm commented Jul 2, 2025

Uh oh!

aarnphm commented Jul 3, 2025

Uh oh!

christian-pinto commented Jul 3, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[Structured Outputs][V1] Skipping with models doesn't contain tokenizers #20365

[Structured Outputs][V1] Skipping with models doesn't contain tokenizers #20365

Uh oh!

Conversation

aarnphm commented Jul 2, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Jul 2, 2025

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aarnphm commented Jul 2, 2025

Uh oh!

aarnphm commented Jul 3, 2025

Uh oh!

christian-pinto commented Jul 3, 2025

Uh oh!

Uh oh!

Uh oh!