Skip to content

Commit dc09523

Browse files
testdigczhu15
andauthored
Fix pooling (classify or embed) model returns nan when dtype is float16 (or downcasted to float16) (HabanaAI#1441)
cc: @yangulei @ranzhejiang @czhu15 --------- Co-authored-by: Bob Zhu <bob.zhu@intel.com>
1 parent 0d4fbe3 commit dc09523

File tree

1 file changed

+2
-0
lines changed

1 file changed

+2
-0
lines changed

vllm/worker/hpu_model_runner.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -396,6 +396,8 @@ def _set_attn_bias(self, attn_metadata, batch_size, seq_len, device,
396396
len_mask_v = len_mask.view(batch_size, 1, seq_len, 1)
397397
mask = attn_mask.logical_or(len_mask).logical_or(len_mask_v)
398398
off_value = -3E38 #small number, avoid nan and overflow
399+
if dtype == torch.float16:
400+
off_value = -63000 # a small value close to float16.min
399401
else:
400402
mask = attn_mask.logical_or(
401403
len_mask) #no need for len_mask_v as decode overwrites it

0 commit comments

Comments
 (0)