This minor release brings in multi-channel audio merging which was a frequently requested feature. This changes the default audio processing code path to always merge all channels if multiple are detected, whereas before it was only using channel 0. You can also select specific channels when loading audio as part of the config:
let config = WhisperKitConfig(
...
audioInputConfig: AudioInputConfig(channelMode: .sumChannels([1, 3, 5]))
)
This could be used as a simplified form of speaker separation if your audio file has distinct speakers in different channels.
From #320:
The audio merging algorithm works like this:
We find the peak across all channels, check if the peak of the mono (summed) version is higher than any of the peaks of the channels, then we multiply the whole track so that the peak of the mono channel matches the peak of the loudest channel.Eg: Top mono (merged) buffer, bottom individual channels (pre-merge)
Here you can see how the merged audio maintains the same loudness as the original multi-channel audio file, and you can see the total merged waveform of all the channels.
This release also brings in updates to the recommendedModels
function for the latest devices released since the last update, as well as some improved testing methods.
What's Changed
- Multi channel audio merging by @flashno in #320
- Platform-specific defaults for concurrentWorkerCount by @ZachNagengast in #321
- Updated
fallbackModelSupportConfig
with extra device identifiers by @iandundas in #323 - Use remote models for test by @ZachNagengast in #324
New Contributors
Full Changelog: v0.11.0...v0.12.0