Releases: k2-fsa/sherpa-onnx
source-separation-models
audio_example.wav
is converted from
https://github.com/deezer/spleeter/blob/v1.4.0/audio_example.mp3
with the following command
sox audio_example.mp3 audio_example.wav
v1.12.0
What's Changed
- Fix building wheels for macOS by @csukuangfj in #2192
- Show verbose logs in homophone replacer by @csukuangfj in #2194
- Fix displaying streaming speech recognition results for Python. by @csukuangfj in #2196
- Add real-time speech recognition example for SenseVoice. by @csukuangfj in #2197
- docs: add Open-XiaoAI KWS project by @idootop in #2198
- Add C++ example for streaming ASR with SenseVoice. by @csukuangfj in #2199
- Add C++ example for real-time ASR with nvidia/parakeet-tdt-0.6b-v2. by @csukuangfj in #2201
- Add a link to YouTube video including sherpa-onnx. by @csukuangfj in #2202
- Support sending is_eof for online websocket server. by @csukuangfj in #2204
- Add alsa-based streaming ASR example for sense voice. by @csukuangfj in #2207
- Support homophone replacer in Android asr demo. by @csukuangfj in #2210
- Add a Go implementation of the TTS generation callback by @xiaokuang95 in #2213
- Add Android demo for real-time ASR with non-streaming ASR models. by @csukuangfj in #2214
- Expose dither for JNI by @esavin in #2215
- Add nodejs example for parakeet-tdt-0.6b-v2. by @csukuangfj in #2219
- Add script to build APK for simulated-streaming-asr. by @csukuangfj in #2220
- Release v1.12.0 by @csukuangfj in #2221
New Contributors
Full Changelog: v1.11.5...v1.12.0
v1.11.5
What's Changed
- export parakeet-tdt-0.6b-v2 to sherpa-onnx by @csukuangfj in #2180
- Add C++ runtime for parakeet-tdt-0.6b-v2. by @csukuangfj in #2181
- Avoid NaN in feature normalization. by @csukuangfj in #2186
- Release v1.11.5 by @csukuangfj in #2187
Full Changelog: v1.11.4...v1.11.5
v1.11.4
What's Changed
- Disable strict hotword matching mode for offline transducer by @vsd-vector in #1837
- Comment refinement: Add note about vocoder file for matcha TTS config by @HaoWang0101 in #2106
- Fix a typo in the JNI for Android. by @csukuangfj in #2108
- Generate subtitles with FireRedAsr models by @csukuangfj in #2112
- Use manylinux_2_28_x86_64 to build linux gpu for sherpa-onnx by @csukuangfj in #2123
- Support running sherpa-onnx with RK NPU on Android by @csukuangfj in #2124
- Fix building for HarmonyOS by @csukuangfj in #2125
- cmake build, configurable from env by @KarelVesely84 in #2115
- Expose dither in python API by @nshmyrev in #2127
- Add support for GigaAM-CTC-v2 by @rominf in #2135
- Support Giga AM transducer V2 by @csukuangfj in #2136
- Export kokoro 1.0 int8 models by @csukuangfj in #2137
- Upload more onnx ASR models by @csukuangfj in #2141
- Fix building for open harmonyOS by @csukuangfj in #2142
- online-transducer: reset the encoder toghter with 2 previous output symbols (non-blank) by @KarelVesely84 in #2129
- Fix punctuations for kokoro tts 1.1-zh. by @csukuangfj in #2146
- Fix setting OnlineModelConfig in Java API by @csukuangfj in #2147
- Support decoding multiple streams in Java API. by @csukuangfj in #2149
- Support replacing homonphonic phrases by @csukuangfj in #2153
- Add C and CXX API for homophone replacer by @csukuangfj in #2156
- Add JavaScript API (WASM) for homophone replacer by @csukuangfj in #2157
- Add JavaScript API (node-addon) for homophone replacer by @csukuangfj in #2158
- Fix building without TTS by @csukuangfj in #2159
- Add homonphone replacer example for Python API. by @csukuangfj in #2161
- More fix for building without tts by @csukuangfj in #2162
- Add Swift API for homophone replacer. by @csukuangfj in #2164
- Add C# API for homophone replacer by @csukuangfj in #2165
- Add Kotlin and Java API for homophone replacer by @csukuangfj in #2166
- Add Dart API for homophone replacer by @csukuangfj in #2167
- Add Go API for homophone replacer by @csukuangfj in #2168
- Release v1.11.4 by @csukuangfj in #2169
New Contributors
- @HaoWang0101 made their first contribution in #2106
Full Changelog: v1.11.3...v1.11.4
hr-files
replace.fst is generated from
https://colab.research.google.com/drive/1jEaS3s8FbRJIcVQJv2EQx19EM_mnuARi?usp=sharing
If you don't have access to the colab notebook, here is the code for generating replace.fst
:
import pynini
from pynini.lib import utf8, byte
from pynini import cdrewrite
sigma = utf8.VALID_UTF8_CHAR.star
rule1 = pynini.cross("dan1ni2er3bo1wei2", "丹尼尔·波维")
rule10 = pynini.cross("dan1ni2er3bo1wei4", "丹尼尔·波维")
rule2 = pynini.cross('dou4dou4', '豆豆')
rule3 = pynini.cross('cheng2cheng2', '橙橙')
rule30 = pynini.cross('chen2chen2', '橙橙')
rule4 = pynini.cross('qiao2qiao2', '峤峤')
rule5 = pynini.cross('qiu2qiu2', '球球')
rule6 = pynini.cross('lin2mei3li4', '林美丽')
rule7 = pynini.cross('guo3guo3', '果果')
rule8 = pynini.cross('miao2miao2', '苗苗')
rule9 = pynini.cross('xuan2jie4', '玄戒')
rule10 = pynini.cross('xuan2jie4xin1pian1', '玄戒芯片')
rule11 = pynini.cross('xuan2jie4xing1pian1', '玄戒芯片')
rule12 = pynini.cross('xuan2jie4xin1pian1', '玄戒芯片')
rule13 = pynini.cross('xuan2jie4xing1pian1', '玄戒芯片')
rule = (rule1 | rule10 | rule2 | rule3 | rule30 | rule4 | rule5 | rule6 | rule7 | rule8 | rule9 | rule10 | rule11 | rule12 | rule13).optimize()
rule = cdrewrite(rule, "", "", sigma)
rule.write('replace.fst')
Note that you need to use
pip install --only-binary :all: pynini
to install pynini
v1.11.3
What's Changed
- fix vits dict dir config by @amutu in #2036
- fix case by @amutu in #2037
- Fix building wheels for RKNN by @csukuangfj in #2041
- 缩放因子应该是32767? by @yourengod in #2056
- Fix length scale for kokoro tts by @csukuangfj in #2060
- Allow building repository as CMake subdirectory by @niansa in #2059
- Export silero_vad v4 to RKNN by @csukuangfj in #2067
- 修复 DirectML 支持 by @endink in #2066
- Fix building aar to include speech denoiser by @csukuangfj in #2069
- Add CXX API for VAD by @csukuangfj in #2077
- Add C++ runtime for silero_vad with RKNN by @csukuangfj in #2078
- Refactor rknn code by @csukuangfj in #2079
- Fix building for android by @csukuangfj in #2081
- Add C++ and Python API for Dolphin CTC models by @csukuangfj in #2085
- Add Kotlin and Java API for Dolphin CTC models by @csukuangfj in #2086
- Add C and CXX API for Dolphin CTC models by @csukuangfj in #2088
- Preserve more context after endpointing in transducer by @vsd-vector in #2061
- Add C# API for Dolphin CTC models by @csukuangfj in #2089
- Add Go API for Dolphin CTC models by @csukuangfj in #2090
- Add Swift API for Dolphin CTC models by @csukuangfj in #2091
- Add Javascript (WebAssembly) API for Dolphin CTC models by @csukuangfj in #2093
- Add Javascript (node-addon) API for Dolphin CTC models by @csukuangfj in #2094
- Add Dart API for Dolphin CTC models by @csukuangfj in #2095
- Add Pascal API for Dolphin CTC models by @csukuangfj in #2096
- Release v1.11.3 by @csukuangfj in #2097
New Contributors
- @amutu made their first contribution in #2036
- @yourengod made their first contribution in #2056
- @niansa made their first contribution in #2059
Full Changelog: v1.11.2...v1.11.3
v1.11.2
What's Changed
- Fix CI tests. by @csukuangfj in #2016
- Publish jar for more java versions by @csukuangfj in #2017
- add alsa example for vad+offline asr by @csukuangfj in #2020
- Support cuda12 and cudnn8 for Linux aarch64. by @csukuangfj in #2021
- Update README to include more projects using sherpa-onnx by @csukuangfj in #2022
- Fix a bug in vad.reset() by @csukuangfj in #2023
- Fix Matcha + vocos for Android by @csukuangfj in #2024
- Fix crash in Android tts engine demo. by @csukuangfj in #2029
- Fix build script by @sienaiwun in #2033
- fix static linking by @sangeet2020 in #2032
- Release v1.11.2 by @csukuangfj in #2035
New Contributors
- @sienaiwun made their first contribution in #2033
Full Changelog: v1.11.1...v1.11.2
v1.11.1
What's Changed
- Export vocos to sherpa-onnx by @csukuangfj in #2012
- Add C++ runtime for vocos by @csukuangfj in #2014
- Release v1.11.1 by @csukuangfj in #2015
Full Changelog: v1.11.0...v1.11.1
v1.11.0
What's Changed
- Fix building wheels for Python 3.7 by @csukuangfj in #1933
- Add Kotlin and Java API for online punctuation models by @csukuangfj in #1936
- Add Kokoro v1.1-zh by @csukuangfj in #1942
- Support RKNN for Zipformer CTC models. by @csukuangfj in #1948
- Add transducer modified_beam_search for RKNN. by @csukuangfj in #1949
- Update README to include projects that is using sherpa-onnx by @csukuangfj in #1956
- Limit number of tokens per second for whisper. by @csukuangfj in #1958
- Ebranchformer by @KarelVesely84 in #1951
- Test using sherpa-onnx as a cmake subproject by @csukuangfj in #1961
- Add C++ demo for VAD+non-streaming ASR by @csukuangfj in #1964
- Export gtcrn models to sherpa-onnx by @csukuangfj in #1975
- c-api add wave write to buffer. by @cjsdurj in #1962
- add SherpaOnnxOfflineRecognizerSetConfig binding for go by @franck-li in #1976
- Add C++ runtime for speech enhancement GTCRN models by @csukuangfj in #1977
- Add Python API for speech enhancement GTCRN models by @csukuangfj in #1978
- Add C API for speech enhancement GTCRN models by @csukuangfj in #1984
- Add CXX API for speech enhancement GTCRN models by @csukuangfj in #1986
- Add Swift API for speech enhancement GTCRN models by @csukuangfj in #1989
- Add C# API for speech enhancement GTCRN models by @csukuangfj in #1990
- Add Go API for speech enhancement GTCRN models by @csukuangfj in #1991
- Add Pascal API for speech enhancement GTCRN models by @csukuangfj in #1992
- Add Dart API for speech enhancement GTCRN models by @csukuangfj in #1993
- Add JavaScript (node-addon) API for speech enhancement GTCRN models by @csukuangfj in #1996
- Add WebAssembly (WASM) for speech enhancement GTCRN models by @csukuangfj in #2002
- Add JavaScript API (wasm) for speech enhancement GTCRN models by @csukuangfj in #2007
- Add Kotlin API for speech enhancement GTCRN models by @csukuangfj in #2008
- Add Java API for speech enhancement GTCRN models by @csukuangfj in #2009
- Release v1.11.0 by @csukuangfj in #2010
New Contributors
Full Changelog: v1.10.46...v1.11.0
speech-enhancement-models
gtrcn_simple.onnx
is from https://github.com/Xiaobin-Rong/gtcrn
speech_with_noise.wav
is from https://modelscope.cn/models/iic/speech_zipenhancer_ans_multiloss_16k_base/file/view/master?fileName=examples%252Fspeech_with_noise.wav&status=0