Skip to content

Add C++ and Python API for Dolphin CTC models #2085

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 2, 2025

Conversation

csukuangfj
Copy link
Collaborator

Support models from https://github.com/DataoceanAI/Dolphin

Note that only the CTC head is supported. We don't support its attention decoder.

Fixes #2083

CC @DataoceanAI @MXuer @hotbaby

Usage

Download a model from
https://github.com/k2-fsa/sherpa-onnx/releases/tag/asr-models

After building sherpa-onnx, run

build/bin/sherpa-onnx-offline
  --dolphin-model=./sherpa-onnx-dolphin-base-ctc-multi-lang-int8-2025-04-02/model.int8.onnx \
  --tokens=./sherpa-onnx-dolphin-base-ctc-multi-lang-int8-2025-04-02/tokens.txt \
  ./sherpa-onnx-dolphin-base-ctc-multi-lang-int8-2025-04-02/test_wavs/0.wav
Screenshot 2025-04-02 at 18 53 54

@csukuangfj csukuangfj merged commit 0de7e1b into k2-fsa:master Apr 2, 2025
178 of 215 checks passed
@MXuer
Copy link

MXuer commented Apr 2, 2025

感谢军哥。

@csukuangfj csukuangfj deleted the support-dolphin branch April 2, 2025 12:30
@hotbaby
Copy link

hotbaby commented Apr 2, 2025

感谢军哥。

@Geministudents
Copy link

sherpa_onnx.OfflineRecognizer.from_dolphin_ctc中没有langue参数,dolphin官方的region_sym参数也没有,这样的话会不会对效果产生影响呢,比如官方转写闽南语需要这两个参数:lang_sym="zh", region_sym="TW"

@csukuangfj
Copy link
Collaborator Author

我们不需要指定,也不支持指定

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

请问可以支持dolphin模型吗
4 participants