v1.7.3
Noticeable Changes
Qwen3 support included for Intel HPU, and fixed for CPU / Metal / CUDA.
What's Changed
- Default to Qwen3 in
README.md
anddocs/
examples by @alvarobartt in #641 - Fix Qwen3 by @kozistr in #646
- Add integration tests for Gaudi by @baptistecolle in #598
- Fix Qwen3-Embedding batch vs single inference inconsistency by @lance-miles in #648
- Fix FlashQwen3 by @kozistr in #650
- Make flake work on metal by @Narsil in #654
- Fixing metal backend. by @Narsil in #655
- Qwen3 hpu support by @kaixuanliu in #656
- change HPU warmup logic: seq length should be with exponential growth by @kaixuanliu in #659
- Update
version
to 1.7.3 by @alvarobartt in #666 - Add last token pooling support for ORT. by @tpendragon in #664
New Contributors
- @lance-miles made their first contribution in #648
- @tpendragon made their first contribution in #664
Full Changelog: v1.7.2...v1.7.3