Noticeable Changes
Qwen3 was not working fine on CPU / MPS when sending batched requests on FP16 precision, due to the FP32 minimum value downcast (now manually set to FP16 minimum value instead) leading to null
values, as well as a missing to_dtype
call on the attention_bias
when working with batches.
What's Changed
- Fix Qwen3 Embedding Float16 DType by @tpendragon in #663
- Fix
fmt
by re-runningpre-commit
by @alvarobartt in #671 - Update
version
to 1.7.4 by @alvarobartt in #677
Full Changelog: v1.7.3...v1.7.4