Release v0.12.0 · argmaxinc/WhisperKit

This minor release brings in multi-channel audio merging which was a frequently requested feature. This changes the default audio processing code path to always merge all channels if multiple are detected, whereas before it was only using channel 0. You can also select specific channels when loading audio as part of the config:

        let config = WhisperKitConfig(
            ...
            audioInputConfig: AudioInputConfig(channelMode: .sumChannels([1, 3, 5]))
        )

This could be used as a simplified form of speaker separation if your audio file has distinct speakers in different channels.

From #320:

The audio merging algorithm works like this:
We find the peak across all channels, check if the peak of the mono (summed) version is higher than any of the peaks of the channels, then we multiply the whole track so that the peak of the mono channel matches the peak of the loudest channel.

Eg: Top mono (merged) buffer, bottom individual channels (pre-merge)

Here you can see how the merged audio maintains the same loudness as the original multi-channel audio file, and you can see the total merged waveform of all the channels.

This release also brings in updates to the recommendedModels function for the latest devices released since the last update, as well as some improved testing methods.

What's Changed

Multi channel audio merging by @flashno in #320
Platform-specific defaults for concurrentWorkerCount by @ZachNagengast in #321
Updated fallbackModelSupportConfig with extra device identifiers by @iandundas in #323
Use remote models for test by @ZachNagengast in #324

New Contributors

@flashno made their first contribution in #320

Full Changelog: v0.11.0...v0.12.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.12.0

What's Changed

New Contributors

Contributors

Uh oh!