Skip to content

Commit 2faf0d0

Browse files
alex-jw-brooksChen-zexi
authored andcommitted
[Bugfix] Fix Tensor Parallelism Padding Consistency in Granite Models (#20843)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
1 parent 322f009 commit 2faf0d0

File tree

1 file changed

+4
-0
lines changed

1 file changed

+4
-0
lines changed

vllm/model_executor/models/granite.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -273,6 +273,10 @@ def __init__(self, *, vllm_config: VllmConfig, prefix: str = ""):
273273
self.vocab_size,
274274
config.hidden_size,
275275
org_num_embeddings=config.vocab_size,
276+
padding_size=DEFAULT_VOCAB_PADDING_SIZE
277+
# We need bigger padding if using lora for kernel
278+
# compatibility
279+
if not lora_config else lora_config.lora_vocab_padding_size,
276280
quant_config=quant_config,
277281
)
278282
else:

0 commit comments

Comments
 (0)