v1.7.3

alvarobartt released this 30 Jun 10:54

· 3 commits to main since this release

fb80177

Noticeable Changes

Qwen3 support included for Intel HPU, and fixed for CPU / Metal / CUDA.

What's Changed

Default to Qwen3 in README.md and docs/ examples by @alvarobartt in #641
Fix Qwen3 by @kozistr in #646
Add integration tests for Gaudi by @baptistecolle in #598
Fix Qwen3-Embedding batch vs single inference inconsistency by @lance-miles in #648
Fix FlashQwen3 by @kozistr in #650
Make flake work on metal by @Narsil in #654
Fixing metal backend. by @Narsil in #655
Qwen3 hpu support by @kaixuanliu in #656
change HPU warmup logic: seq length should be with exponential growth by @kaixuanliu in #659
Update version to 1.7.3 by @alvarobartt in #666
Add last token pooling support for ORT. by @tpendragon in #664

New Contributors

@lance-miles made their first contribution in #648
@tpendragon made their first contribution in #664

Full Changelog: v1.7.2...v1.7.3

Contributors

Narsil, tpendragon, and 5 other contributors

Assets 2