Skip to content

[Core][Model] PrithviMAE Enablement on vLLM v1 engine #20072

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

christian-pinto
Copy link
Contributor

@christian-pinto christian-pinto commented Jun 25, 2025

Purpose

This PR enables the Prithvi MAE Geospatial model on the vLLM V1 engine. This is a result of pooling/embedding models being enabled in V1.

The main changes in this PR are:

  • Adding support for multi-modal models that require the raw multi-modal data in input and not the multi_modal embeddings.
  • Improving support for models that use skip_tokenizer_init=True
  • Improving support for attention-less models by creating a DummyKVCacheManager class
  • Minor changes to the PrithviMAE model and the example usage script.

Test Plan

I have added a test script that is only testing the with vLLM runner as this model is not on HF. Therefore it would only check if the inference takes place correctly for the model.

Test Result

The test is not succeeding at the moment because the pytest process hangs indefinitely while initializing the vlLM engine. The test hangs indefinitely on wait_for_engine_startup. The same code tested outside of pytest works just fine. Any help on getting this test running would be much appreciated.

Documentation Update

Updated examples file.

@DarkLight1337 @maxdebayser @njhill

@christian-pinto christian-pinto changed the title Prithvi v1 embeddings upstream main PrithviMAE Enablement on vLLM v1 engine Jun 25, 2025
@mergify mergify bot added documentation Improvements or additions to documentation multi-modality Related to multi-modality (#4194) structured-output v1 labels Jun 25, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @christian-pinto, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly extends vLLM's capabilities by enabling the Prithvi MAE Geospatial model on the V1 engine. It introduces foundational changes to support multi-modal models that require raw input data, streamlines the handling of attention-less models by providing a dummy KV cache mechanism, and refines the engine's behavior for models that do not require tokenizer initialization.

Highlights

  • Multi-modal Support: Added support for multi-modal models that consume raw input data (e.g., image tensors) directly, rather than pre-computed embeddings. This includes new interfaces (SupportsMultiModalWithRawInput) and integration points in the model runner to pass raw data to the model's forward pass.
  • Attention-Free Models: Introduced a DummyKVCacheManager and updated the scheduler and engine initialization to properly handle models that do not require a KV cache (attention-free models). This prevents unnecessary resource allocation and potential hangs by bypassing KV cache operations for such models.
  • Tokenizer Initialization: Enhanced support for models that specify skip_tokenizer_init=True, ensuring that tokenizer-related operations are conditionally bypassed in various parts of the engine (e.g., LLMEngine, Processor, OutputProcessor, StructuredOutput), improving flexibility for models that don't rely on a traditional tokenizer.
  • PrithviMAE Integration: Integrated the Prithvi MAE Geospatial model into the vLLM V1 engine, including updates to its model definition, multi-modal data processing (e.g., using MultiModalFieldConfig.shared and adjusting dummy data types), and an example inference script.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enables the Prithvi MAE Geospatial model on the vLLM V1 engine by adding support for multi-modal models that require raw multi-modal data, improving support for models that use skip_tokenizer_init=True, and enhancing support for attention-less models. The changes include modifications to the PrithviMAE model, example usage, configuration, and core components of vLLM. The changes look good, but I've added some suggestions for improved comments.

Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@christian-pinto christian-pinto force-pushed the prithvi_v1_embeddings_upstream_main branch from ca85b88 to c7f80c0 Compare June 25, 2025 11:39
vllm/config.py Outdated
Comment on lines 617 to 618
self.is_pooling_model = self.registry.is_pooling_model(
self.architectures)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this a @property?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, no problem

@@ -64,7 +65,130 @@ def new_empty(self) -> "KVCacheBlocks":
return KVCacheBlocks(tuple([] for _ in range(len(self.blocks))))


class KVCacheManager:
class KVCacheManagerInterface(ABC):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure whether this conforms with V1 design. cc @maxdebayser @heheda12345 @WoosukKwon

Copy link
Contributor Author

@christian-pinto christian-pinto Jun 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have used the same interface template used for the Scheduler in v1. If another way of defining an interface is the preferred one, please let me know.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DarkLight1337 , by V1 design do you mean "the right way of defining interfaces" or whether the KVCacheManager should be pluggable or not?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, is there anyone working on pure SSM model support in V1?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DarkLight1337 , by V1 design do you mean "the right way of defining interfaces" or whether the KVCacheManager should be pluggable or not?

Yeah, I'm not sure which parts of KVCacheManager should be customizable.

BTW, is there anyone working on pure SSM model support in V1?

#19327

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have investigated option 1 and it would require a number of small changes in the KVCacheManager, KVCacheCoordinator, etc. The logic is relying on the fact that there are > 0 kv_cache_group. I am not sure we end-up with a cleaner implementation. By overloading instead we just get rid of all the logic that would otherwise be useless when a KVCache is not used.

These are my two cents but I am happy to go for the solution that is more in line with the strategy for the KVCacheManager.

Copy link
Contributor Author

@christian-pinto christian-pinto Jul 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@heheda12345 any further comments on this? Thanks.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Want minor changes do you need? I feel it would be quite clean to treat it as 0 groups but maybe I've missed something.

Copy link
Contributor Author

@christian-pinto christian-pinto Jul 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I have tried to do is to initialize the kv cache manager with the below config

KVCacheConfig(num_blocks=0,
    kv_cache_tensors=[],
    kv_cache_groups=[])
]

This however it is not enough for the kvcache manager to function with 0 blocks.

I ended up having to modify also:

BlockPool in v1.core.block_pool.py because it uses a double linked (FreeKVCacheBlockQueue) list to track the free blocks. Such list is not designed to work with an empty list of kvcache blocks.

Scheduler because on each new request it invokes the allocate_blocks from the kvcache manager and if no blocks are returned requests are unschedulable. Some changes are then requires to make sure that requests with no associated block ids are schedulable.

I also had to do changes around to make sure the logic of the KV cache would work with no KV cache groups.

For simplicity I have tried implementing the other approach in this PR: #20577

I still remain of the idea that initializing all the objects that come with the KVCacheManager (KVCacheCoordinator, BLockPool, FreeKVCachePool, etc.) for no reason is less clean than overloading the KVCacheManager.

I might also have misunderstood your suggestions @heheda12345 on using 0 kv cache groups so please please take a look at his new PR.

Thanks

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I've added some comments in #20577 and feel that 0 kv cache group is possible.

Copy link
Contributor

@maxdebayser maxdebayser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR introduces some changes that could be useful for future changes such as the implementation of pure SSM models and encoder embedding models. Currently the design of multi-modal inputs as embeddings seems pretty restrictive to me, but I don't know enough about all the design considerations that went into it. So to my superficial level of knowledge about the multi modal models, the support for raw inputs seems pretty useful.

# `kv_cache_config` to be consistent across all workers to make sure
# all the memory operators can be applied to all workers.
unify_kv_cache_configs(kv_cache_configs)
if vllm_config.model_config.is_attention_free:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we perhaps generalize the name of this property to indicate that it's kv-cache free? Technically the encoder embedding models don't need a kv-cache but they do use attention.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd personally create a second flag that is a subset of this one to just disable KV Caching. In the end it is fine that for attention free models neither an attention backend nor a KVCache are instantiated.
At the moment I see that the initialization of the attention backend is done as part of initializing the KV Cache.

I believe this should be done in another PR, though. the changes here were simply because the is_attention_free flag was not fully working when I tested it.

@@ -547,6 +554,55 @@ def _update_states(self, scheduler_output: "SchedulerOutput") -> None:
if batch_changed or batch_reordered:
self.input_batch.refresh_sampling_metadata()

def _add_multimodal_inputs_to_model_args(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this function is only called in _maybe_add_multimodal_kwargs perhaps it's better to merge them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, makes sense. Thanks

scheduler_output,
num_reqs)

def _maybe_compute_attn_prefix(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of this function? It looks like an extension point for a subclass but it's not called anywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is just something I have brought in when I started working on this based on support for embeddings on v1 before it got merged. I will remove it.

) -> list[int]:
return [0] * len(self.kv_cache_config.kv_cache_groups)

def _maybe_prepare_additional_inputs(self,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above 😄

scheduler_config=self.vllm_config.scheduler_config,
lora_config=self.vllm_config.lora_config,
).get_lora_tokenizer(None)
self.tokenizer = None if vllm_config.model_config.skip_tokenizer_init \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this related to this PR?

btw I we can't skip tokenizer init here given that tokenizer is required for all of the structured outputs backend. therefore, this shouldn't be changed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, this was one quick and dirty change that I made and forgot to sanitize.

Since the tokenizer is required for structured output generation the output manager should not be initialized at all when we skip tokenizer init. Also, all the call to the structured output manager should be guarded against it.

Let me give this a go and you can have a look at it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aarnphm I have reworked supporting skip_tokenizer_init so that it takes care of the structured output manager. Please have a look at commit 30ed44e

Copy link
Collaborator

@aarnphm aarnphm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qq: does this work with transformers implementation?

@aarnphm
Copy link
Collaborator

aarnphm commented Jun 27, 2025

qq: does this work with transformers implementation?

Ah i wad referring to --model-impl transformers but it doesn't seem to work. All good though

Copy link
Collaborator

@aarnphm aarnphm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For models that doesn't support structured outputs, I think we should perform the gate at before server startup time.

For structured outputs overhead, I don't think skip_tokenizer_init is relevant here given that the manager initialize the tokenizer itself.

Copy link

mergify bot commented Jul 2, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @christian-pinto.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

# the input prompt we pass one token and the size of the dummy
# embedding tensors must reflect that.
return torch.empty(input_ids.shape)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the get_multimodal_embeddings() function here. How is this not failing in GPUModelRunner.profile_run()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that memory profiling is only done while initializing the KVCache, which is skipped because the model is attention free.

It should happen in Worker.determine_available_memory() if I am not mistaken.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but if we're adding a new programming model for MM inputs I think it would be nice to handle this. If not a full solution, perhaps a test to disable cuda graphs and memory profiling for this special case.

But I'm thinking that maybe the raw MM inputs will be hard to generalize. Perhaps a hacky solution for this particular model would be to store the raw MM inputs in the get_multimodal_embeddings() method and then use them when forward() is called.

Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
- Improved formatting around
- made is_pooling_model a @Property in ModelConfig

Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
- Remove unused functions
- merged functions not called anywhere else

Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
@christian-pinto christian-pinto force-pushed the prithvi_v1_embeddings_upstream_main branch from 30ed44e to aab8956 Compare July 4, 2025 09:38
@mergify mergify bot removed the needs-rebase label Jul 4, 2025
@christian-pinto
Copy link
Contributor Author

Rebased on the latest master that includes @aarnphm changes to the structured output manager.

Copy link
Collaborator

@aarnphm aarnphm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I did another round of review for other code misc to reduce general diff. I don't have an opinion for the KVCache part, so will leave that to others to comment on it.

parent_req=parent_req,
request_index=request_index,
queue=queue,
log_stats=self.log_stats)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
log_stats=self.log_stats)
log_stats=self.log_stats,)

to reduce diff

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

Comment on lines 390 to 394
if tokenizer:
max_input_id = max(prompt_ids, default=0)
if max_input_id > tokenizer.max_token_id:
raise ValueError(
f"Token id {max_input_id} is out of vocabulary")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
if tokenizer:
max_input_id = max(prompt_ids, default=0)
if max_input_id > tokenizer.max_token_id:
raise ValueError(
f"Token id {max_input_id} is out of vocabulary")
if tokenizer and (max_input_id:=max(prompt_ids, default=0)) > tokenizer.max_token_id:
raise ValueError(f"Token id {max_input_id} is out of vocabulary")

use_eagle=self.use_eagle,
log_stats=self.log_stats,
enable_kv_cache_events=self.enable_kv_cache_events,
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert back self.use_pp = ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, thank you.

Comment on lines 496 to 497
req_to_new_block_ids[request.request_id] = \
self.kv_cache_manager.get_block_ids(request.request_id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert this style change. We will run ruff format at some point before 0.10

Comment on lines 95 to 96
# num_gpu_blocks can be ero for attention free models
assert num_gpu_blocks is not None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# num_gpu_blocks can be ero for attention free models
assert num_gpu_blocks is not None
# num_gpu_blocks can be zero for attention free models
assert num_gpu_blocks is not None

Comment on lines +378 to +381
if model_config.skip_tokenizer_init:
tokenizer = None
else:
tokenizer = self.tokenizer.get_lora_tokenizer(lora_request)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does this look for diff

Suggested change
if model_config.skip_tokenizer_init:
tokenizer = None
else:
tokenizer = self.tokenizer.get_lora_tokenizer(lora_request)
tokenizer = None if model_config.skip_tokenizer_init \
else self.tokenizer.get_lora_tokenizer(lora_request)


# nothing to be reordered when the mdoel is attention free
if self.model_config.is_attention_free:
return False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return False
return

def _maybe_add_multimodal_kwargs(
self,
model_kwargs: dict[str, Any],
scheduler_output: "SchedulerOutput" = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
scheduler_output: "SchedulerOutput" = None,
scheduler_output: "Optional[SchedulerOutput]" = None,

intermediate_tensors=intermediate_tensors,
inputs_embeds=inputs_embeds,
**MultiModalKwargs.as_kwargs(
model_kwargs, device=self.device))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
model_kwargs, device=self.device))
model_kwargs, device=self.device),)

To also reduce the diff


def get_computed_blocks(self,
request: Request) -> tuple[KVCacheBlocks, int]:
return (KVCacheBlocks([]), 0)
Copy link
Collaborator

@heheda12345 heheda12345 Jul 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's your expect behavior of prefix caching? Seems that someone want prefix caching to be enabled in "last" pooling method.

if pooling_type is None or pooling_type.lower() != "last":

Copy link
Collaborator

@heheda12345 heheda12345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the KVCacheManager modification necessary to this PR? As #16188 is runnable, is it possible to only include other necessary changes in this PR and split the KVCacheManager changes to another PR? It can both unblock this PR and help me to review KVCacheManager changes easier.

@@ -64,7 +65,130 @@ def new_empty(self) -> "KVCacheBlocks":
return KVCacheBlocks(tuple([] for _ in range(len(self.blocks))))


class KVCacheManager:
class KVCacheManagerInterface(ABC):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I've added some comments in #20577 and feel that 0 kv cache group is possible.

@christian-pinto
Copy link
Contributor Author

Is the KVCacheManager modification necessary to this PR? As #16188 is runnable, is it possible to only include other necessary changes in this PR and split the KVCacheManager changes to another PR? It can both unblock this PR and help me to review KVCacheManager changes easier.

Hi @heheda12345, unfortunately these changes are necessary or otherwise I will have to hack around a number of things including the profiling run for memory profiling as this is not a language model so profiling as it is does not work. These changes would be totally useless because I do not need any of this profiling to take place as this model does not use attention. Therefore I would endup making some changes that would be there solely to support my model until attention free models are supported and then revert everything back as it was.
Does this make sense?

To help you in reviewing the two options, in the new PR (the one using zero blocks) the changes for supporting attention free models are only in the last commit. Similarly in this PR all changes related to supporting attention free are in commits 953f66a and 953f66a

Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
@mergify mergify bot added the new-model Requests to new models label Jul 10, 2025
Copy link

mergify bot commented Jul 11, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @christian-pinto.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jul 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation multi-modality Related to multi-modality (#4194) needs-rebase new-model Requests to new models structured-output v1
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

5 participants