WhisperKit Benchmarks #243
atiorh
announced in
Announcements
Replies: 1 comment
-
Note: Higher performance (speed) with WhisperKit is possible. However, the benchmark data represents using the recommended (default) configuration that best balances battery life, thermal sustainability, memory consumption and latency for a smooth user experience. For example, on M2 Ultra, WhisperKit runs the latest OpenAI Large V3 Turbo model (v20240930/turbo in WhisperKit) as fast as 72x real-time with a GPU+ANE config. However, the default config (ANE only) is published as 42x real-time on the benchmarks. M2_Ultra_large_v3_turbo.mov |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
We are thrilled to announce our comprehensive benchmark suite for WhisperKit!
Benchmarks (Hugging Face Space)
Detailed Announcement (Twitter)
The benchmarks will be updated with every release starting WhisperKit-0.9!
Performance (speed) is reported on long-form ("from file" proxy) and short-form ("streaming" proxy) audio. The test data used in benchmarks is published on Hugging Face and benchmarks are reproducible by following instructions in BENCHMARKS.md.
Quality is reported across 3 datasets and 77 languages using WER and other metrics. Speech-to-text as well as Language Detection tasks are evaluated.
Device Support data is also published so developers can build presets for WhisperKit to best fit each end-user device while maximizing speed and/or accuracy as much as possible. Raw data here.
Looking forward to the community feedback!
Beta Was this translation helpful? Give feedback.
All reactions