Skip to content

v1.7.4

Latest
Compare
Choose a tag to compare
@alvarobartt alvarobartt released this 07 Jul 12:33
6e900af

Noticeable Changes

Qwen3 was not working fine on CPU / MPS when sending batched requests on FP16 precision, due to the FP32 minimum value downcast (now manually set to FP16 minimum value instead) leading to null values, as well as a missing to_dtype call on the attention_bias when working with batches.

What's Changed

Full Changelog: v1.7.3...v1.7.4