Skip to content

[Model] Add ModelConfig class for GraniteMoeHybrid to override default max_seq_len_to_capture #20923

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 16, 2025
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions vllm/model_executor/models/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -200,11 +200,25 @@ def verify_and_update_config(vllm_config: "VllmConfig") -> None:
}


class GraniteMoeHybridModelConfig(VerifyAndUpdateConfig):

@staticmethod
def verify_and_update_config(vllm_config: "VllmConfig") -> None:
config = vllm_config.model_config
config.max_seq_len_to_capture = config.max_model_len
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The implementation unconditionally sets config.max_seq_len_to_capture = config.max_model_len. Consider checking if config.max_seq_len_to_capture already has a user-defined value before overriding it. This would prevent unexpected behavior if a user has explicitly configured this value.

if not hasattr(config, 'max_seq_len_to_capture'):
    config.max_seq_len_to_capture = config.max_model_len

logger.info(
"Setting max_seq_len_to_capture to %d "
"to ensure that CUDA graph capture "
"covers sequences of length up to max_model_len.",
config.max_model_len)
Comment on lines +214 to +218
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Consider adding a warning log message if max_seq_len_to_capture is being overridden, to inform the user that their specified value is not being used. This will help in debugging and understanding the configuration.

Suggested change
logger.info(
"Setting max_seq_len_to_capture to %d "
"to ensure that CUDA graph capture "
"covers sequences of length up to max_model_len.",
config.max_model_len)
if hasattr(config, 'max_seq_len_to_capture') and config.max_seq_len_to_capture != config.max_model_len:
logger.warning(
"Overriding user-specified max_seq_len_to_capture to %d "
"to ensure that CUDA graph capture "
"covers sequences of length up to max_model_len.",
config.max_model_len)



MODELS_CONFIG_MAP: dict[str, type[VerifyAndUpdateConfig]] = {
"GteModel": SnowflakeGteNewModelConfig,
"GteNewModel": GteNewModelConfig,
"NomicBertModel": NomicBertModelConfig,
"Qwen3ForSequenceClassification": Qwen3ForSequenceClassificationConfig,
"XLMRobertaModel": JinaRobertaModelConfig,
"JinaVLForRanking": JinaVLForSequenceClassificationConfig,
"GraniteMoeHybridForCausalLM": GraniteMoeHybridModelConfig,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Consider adding GraniteMoeHybridForCausalLM to __all__ in vllm/model_executor/models/__init__.py to ensure it is accessible for external use and doesn't break any implicit API contracts.

}