Speech Note 4.6.0
Linux Desktop
Changes:
- User Interface
- Speech Note has been translated into Norwegian language.
- Grouped models. Models that provide multiple sub-models (for example, TTS models that provide different voices) are shown in groups. This makes it easier to find models in the model browser.
- Speech to Text
- The name of the all Whisper models has been changed to WhisperCpp to better reflect the engine behind them.
- Automatic language detection in STT. To automatically detect the language during STT, select one of the models that is in the Auto detected category in the language list.
- Separate settings for engines. The configuration of each engine has been separated in the settings. You can separately set the parameters for WhisperCpp and FasterWhisper. The new configuration parameters that have been added to the settings are: Number of simultaneous threads, Beam search width, Audio context size, Use Flash Attention.
- Quicker decoding with WhisperCpp. Optimization for short sentences has been added to WhisperCpp. With it, the speed of STT has doubled!
- Support for OpenVINO hardware acceleration in WhisperCpp engine. With OpenVINO decoding on CPU is much quicker. If you are not using GPU acceleration, it is recommended to enable OpenVINO in WhisperCpp engine settings. Currently, OpenVINO is enabled only for CPU acceleration.
- Option for inserting processing statistics. New settings option allows inserting processing related information to the text after decoding, such as processing time and audio length. This can be useful for comparing the performance of different models, engines and their parameters.
- Text to Speech
- Control tags for advance TTS processing. Control tags allow you to dynamically change the speed of synthesized text or add silence between sentences. To use control tags, insert {speed: 0.5} or {silence: 1s} into the text. For convenience, you can also insert predefined control tags using text context menu Insert control tag.
- Welsh language. New language is enabled with Piper voice.
- New Piper voices for Spanish, Italian and English
- New RHVoice voices for Slovak and Croatian
- Translator
- Improved Translator UI. The Translate, Switch languages and Add buttons have been placed between text areas which is more convenient.
- Support for older hardware. Until now, the translator did not work on older processors without CPU AVX extension. Now there is no such restriction anymore.
- New models: English to Lithuanian, Croatian to English, Latvian to English, Danish to English, Serbian to English, Slovak to English, Bosnian to English, Vietnamese to English
- Updated models: Lithuanian to English, Slovenian to English, Russian to English, Ukrainian to English
- Flatpak
- New library: OpenVINO version 2024.1.0.15008
- whisper.cpp update to version 1.6.2
- CTranslate2 update to version 4.3.1
Video presentation of all new features: https://www.youtube.com/watch?v=AVW5OY63wjg
Sailfish OS
Changes:
- User Interface
- Speech Note has been translated into Norwegian language.
- Grouped models. Models that provide multiple sub-models (for example, TTS models that provide different voices) are shown in groups. This makes it easier to find models in the model browser.
- Option to enable/disable support for subtitles. Subtitle support is a niche functionality. To simplify the user interface, the subtitle options is not visible by default. To enable them, use the Subtitles support option in the settings.
- Speech to Text
- The name of the all Whisper models has been changed to WhisperCpp to better reflect the engine behind them.
- Automatic language detection in STT. To automatically detect the language during STT, select one of the models that is in the Auto detected category in the language list.
- Quicker decoding with WhisperCpp. Optimization for short sentences has been added to WhisperCpp. With it, the speed of STT has doubled!
- Translate to English option for WhisperCpp models. When enabled, speech is automatically translated into English.
- Option for inserting processing statistics. New settings option allows inserting processing related information to the text after decoding, such as processing time and audio length. This can be useful for comparing the performance of different models, engines and their parameters.
- Text to Speech
- Welsh language. New language is enabled with Piper voice.
- New Piper voices for Spanish, Italian and English
- New RHVoice voices for Slovak and Croatian
- Translator
- New button for switching languages.
- New models: English to Lithuanian, Croatian to English, Latvian to English, Danish to English, Serbian to English, Slovak to English, Bosnian to English, Vietnamese to English
- Updated models: Lithuanian to English, Slovenian to English, Russian to English, Ukrainian to English