Skip to content

Support Giga AM transducer V2 #2136

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 20, 2025
Merged

Conversation

csukuangfj
Copy link
Collaborator

Fixes #2098

CC @sh1man999 @rominf

See also
https://github.com/salute-developers/GigaAM?tab=readme-ov-file#performance-metrics-word-error-rate

Usage

Please first install sherpa-onnx, e.g.,

pip install sherpa-onnx

Transducer (v2)

curl -SL -O https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19.tar.bz2
tar xvf sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19.tar.bz2
rm sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19.tar.bz2

sherpa-onnx-offline \
  --encoder=./sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/encoder.int8.onnx \
  --decoder=./sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/decoder.onnx \
  --joiner=./sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/joiner.onnx \
  --tokens=./sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/tokens.txt \
  --model-type=nemo_transducer \
  ./sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/test_wavs/example.wav

The output is given below

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100  164M  100  164M    0     0  39.3M      0  0:00:04  0:00:04 --:--:-- 46.3M
sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/
sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/decoder.onnx
sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/LICENSE
sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/joiner.onnx
sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/tokens.txt
sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/README.md
sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/test-onnx-rnnt.py
sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/encoder.int8.onnx
sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/test_wavs/
sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/run-rnnt-v2.sh
sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/test_wavs/example.wav
/project/sherpa-onnx/csrc/parse-options.cc:Read:375 sherpa-onnx-offline --encoder=./sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/encoder.int8.onnx --decoder=./sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/decoder.onnx --joiner=./sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/joiner.onnx --tokens=./sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/tokens.txt --model-type=nemo_transducer ./sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/test_wavs/example.wav 

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="./sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/encoder.int8.onnx", decoder_filename="./sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/decoder.onnx", joiner_filename="./sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/joiner.onnx"), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), telespeech_ctc="", tokens="./sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/tokens.txt", num_threads=2, debug=False, provider="cpu", model_type="nemo_transducer", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="")
Creating recognizer ...
Started
Done!

./sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19/test_wavs/example.wav
{"lang": "", "emotion": "", "event": "", "text": "ничьих не требуя похвал счастлив уж я надеждой сладкой что дева с трепетом любви посмотрит может быть украдкой на песни грешные мои у лукоморья дуб зеленый", "timestamps": [0.04, 0.12, 0.16, 0.24, 0.32, 0.40, 0.44, 0.52, 0.56, 0.60, 0.64, 0.72, 0.76, 0.80, 0.88, 0.96, 1.04, 1.12, 1.16, 1.24, 1.32, 1.36, 1.44, 1.56, 1.76, 1.84, 1.88, 1.96, 2.00, 2.04, 2.08, 2.16, 2.24, 2.28, 2.36, 2.40, 2.48, 2.60, 2.68, 2.72, 2.76, 2.84, 2.92, 2.96, 3.04, 3.08, 3.16, 3.20, 3.24, 3.32, 3.36, 3.44, 3.52, 3.56, 3.64, 3.68, 3.72, 3.76, 3.80, 3.88, 3.92, 4.00, 4.08, 4.16, 4.20, 4.24, 4.28, 4.32, 4.36, 4.44, 4.52, 4.56, 4.64, 4.68, 4.76, 4.80, 4.88, 4.92, 5.00, 5.08, 5.16, 5.36, 5.44, 5.52, 5.60, 5.68, 5.72, 5.76, 5.84, 5.92, 6.00, 6.04, 6.12, 6.16, 6.20, 6.24, 6.28, 6.32, 6.40, 6.44, 6.48, 6.52, 6.56, 6.64, 6.72, 6.76, 6.84, 6.92, 7.00, 7.04, 7.12, 7.16, 7.20, 7.24, 7.32, 7.36, 7.40, 7.48, 7.60, 7.64, 7.72, 7.76, 7.84, 7.88, 8.00, 8.08, 8.16, 8.24, 8.28, 8.32, 8.44, 8.76, 9.24, 9.32, 9.40, 9.44, 9.52, 9.60, 9.68, 9.76, 9.84, 9.92, 10.00, 10.08, 10.12, 10.24, 10.32, 10.44, 10.52, 10.56, 10.60, 10.68, 10.72, 10.84, 10.92], "tokens":["н", "и", "ч", "ь", "и", "х", " ", "н", "е", " ", "т", "р", "е", "б", "у", "я", " ", "п", "о", "х", "в", "а", "л", " ", "с", "ч", "а", "с", "т", "л", "и", "в", " ", "у", "ж", " ", "я", " ", "н", "а", "д", "е", "ж", "д", "о", "й", " ", "с", "л", "а", "д", "к", "о", "й", " ", "ч", "т", "о", " ", "д", "е", "в", "а", " ", "с", " ", "т", "р", "е", "п", "е", "т", "о", "м", " ", "л", "ю", "б", "в", "и", " ", "п", "о", "с", "м", "о", "т", "р", "и", "т", " ", "м", "о", "ж", "е", "т", " ", "б", "ы", "т", "ь", " ", "у", "к", "р", "а", "д", "к", "о", "й", " ", "н", "а", " ", "п", "е", "с", "н", "и", " ", "г", "р", "е", "ш", "н", "ы", "е", " ", "м", "о", "и", " ", "у", " ", "л", "у", "к", "о", "м", "о", "р", "ь", "я", " ", "д", "у", "б", " ", "з", "е", "л", "е", "н", "ы", "й"], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 4.140 s
Real time factor (RTF): 4.140 / 11.290 = 0.367

CTC (v2)

curl -SL -O https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-nemo-ctc-giga-am-v2-russian-2025-04-19.tar.bz2
tar xvf sherpa-onnx-nemo-ctc-giga-am-v2-russian-2025-04-19.tar.bz2
rm sherpa-onnx-nemo-ctc-giga-am-v2-russian-2025-04-19.tar.bz2

sherpa-onnx-offline \
  --nemo-ctc-model=./sherpa-onnx-nemo-ctc-giga-am-v2-russian-2025-04-19/model.int8.onnx \
  --tokens=./sherpa-onnx-nemo-ctc-giga-am-v2-russian-2025-04-19/tokens.txt \
  --debug=1 \
  ./sherpa-onnx-nemo-ctc-giga-am-v2-russian-2025-04-19/test_wavs/example.wav

The output is

 % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100  159M  100  159M    0     0  75.1M      0  0:00:02  0:00:02 --:--:--  107M
sherpa-onnx-nemo-ctc-giga-am-v2-russian-2025-04-19/
sherpa-onnx-nemo-ctc-giga-am-v2-russian-2025-04-19/model.int8.onnx
sherpa-onnx-nemo-ctc-giga-am-v2-russian-2025-04-19/LICENSE
sherpa-onnx-nemo-ctc-giga-am-v2-russian-2025-04-19/tokens.txt
sherpa-onnx-nemo-ctc-giga-am-v2-russian-2025-04-19/test-onnx-ctc.py
sherpa-onnx-nemo-ctc-giga-am-v2-russian-2025-04-19/export-onnx-ctc.py
sherpa-onnx-nemo-ctc-giga-am-v2-russian-2025-04-19/test_wavs/
sherpa-onnx-nemo-ctc-giga-am-v2-russian-2025-04-19/run-ctc-v2.sh
sherpa-onnx-nemo-ctc-giga-am-v2-russian-2025-04-19/export-onnx-ctc-v2.py
sherpa-onnx-nemo-ctc-giga-am-v2-russian-2025-04-19/test_wavs/example.wav
/project/sherpa-onnx/csrc/parse-options.cc:Read:375 sherpa-onnx-offline --nemo-ctc-model=./sherpa-onnx-nemo-ctc-giga-am-v2-russian-2025-04-19/model.int8.onnx --tokens=./sherpa-onnx-nemo-ctc-giga-am-v2-russian-2025-04-19/tokens.txt --debug=1 ./sherpa-onnx-nemo-ctc-giga-am-v2-russian-2025-04-19/test_wavs/example.wav 

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model="./sherpa-onnx-nemo-ctc-giga-am-v2-russian-2025-04-19/model.int8.onnx"), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), telespeech_ctc="", tokens="./sherpa-onnx-nemo-ctc-giga-am-v2-russian-2025-04-19/tokens.txt", num_threads=2, debug=True, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="")
Creating recognizer ...
/project/sherpa-onnx/csrc/offline-ctc-model.cc:GetModelType:65 is_giga_am=1
language=Russian
version=1
license=https://github.com/salute-developers/GigaAM/blob/main/LICENSE
model_author=https://github.com/salute-developers/GigaAM
model_type=EncDecCTCModel
subsampling_factor=4
normalize_type=
vocab_size=34
onnx.infer=onnxruntime.quant


/project/sherpa-onnx/csrc/offline-nemo-enc-dec-ctc-model.cc:Init:103 is_giga_am=1
language=Russian
version=1
license=https://github.com/salute-developers/GigaAM/blob/main/LICENSE
model_author=https://github.com/salute-developers/GigaAM
model_type=EncDecCTCModel
subsampling_factor=4
normalize_type=
vocab_size=34
onnx.infer=onnxruntime.quant


Started
Done!

./sherpa-onnx-nemo-ctc-giga-am-v2-russian-2025-04-19/test_wavs/example.wav
{"lang": "", "emotion": "", "event": "", "text": "ничьих не требуя похвал счастлив уж я надеждой сладкой что дева с трепетом любви посмотрит может быть украдкой на песни грешные мои у лукоморья дуб зеленый", "timestamps": [0.08, 0.12, 0.20, 0.24, 0.32, 0.40, 0.44, 0.52, 0.56, 0.60, 0.68, 0.76, 0.80, 0.84, 0.88, 1.00, 1.08, 1.16, 1.20, 1.28, 1.32, 1.40, 1.48, 1.60, 1.76, 1.84, 1.88, 1.92, 2.00, 2.04, 2.12, 2.16, 2.24, 2.32, 2.36, 2.40, 2.52, 2.56, 2.68, 2.72, 2.80, 2.84, 2.92, 3.00, 3.04, 3.08, 3.12, 3.20, 3.28, 3.32, 3.36, 3.44, 3.48, 3.56, 3.60, 3.68, 3.72, 3.76, 3.84, 3.92, 3.96, 4.04, 4.08, 4.12, 4.20, 4.24, 4.28, 4.36, 4.40, 4.48, 4.52, 4.56, 4.64, 4.68, 4.76, 4.84, 4.92, 4.96, 5.04, 5.08, 5.24, 5.40, 5.44, 5.56, 5.64, 5.68, 5.72, 5.80, 5.84, 5.92, 5.96, 6.04, 6.12, 6.16, 6.20, 6.24, 6.32, 6.36, 6.40, 6.48, 6.52, 6.56, 6.64, 6.68, 6.76, 6.80, 6.84, 6.96, 7.00, 7.04, 7.08, 7.16, 7.20, 7.28, 7.32, 7.36, 7.44, 7.52, 7.60, 7.64, 7.72, 7.80, 7.84, 7.92, 8.04, 8.08, 8.16, 8.20, 8.28, 8.32, 8.44, 9.04, 9.28, 9.32, 9.44, 9.48, 9.56, 9.60, 9.76, 9.80, 9.88, 9.92, 10.00, 10.08, 10.20, 10.24, 10.32, 10.40, 10.52, 10.56, 10.64, 10.68, 10.80, 10.84, 10.92], "tokens":["н", "и", "ч", "ь", "и", "х", " ", "н", "е", " ", "т", "р", "е", "б", "у", "я", " ", "п", "о", "х", "в", "а", "л", " ", "с", "ч", "а", "с", "т", "л", "и", "в", " ", "у", "ж", " ", "я", " ", "н", "а", "д", "е", "ж", "д", "о", "й", " ", "с", "л", "а", "д", "к", "о", "й", " ", "ч", "т", "о", " ", "д", "е", "в", "а", " ", "с", " ", "т", "р", "е", "п", "е", "т", "о", "м", " ", "л", "ю", "б", "в", "и", " ", "п", "о", "с", "м", "о", "т", "р", "и", "т", " ", "м", "о", "ж", "е", "т", " ", "б", "ы", "т", "ь", " ", "у", "к", "р", "а", "д", "к", "о", "й", " ", "н", "а", " ", "п", "е", "с", "н", "и", " ", "г", "р", "е", "ш", "н", "ы", "е", " ", "м", "о", "и", " ", "у", " ", "л", "у", "к", "о", "м", "о", "р", "ь", "я", " ", "д", "у", "б", " ", "з", "е", "л", "е", "н", "ы", "й"], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 5.276 s
Real time factor (RTF): 5.276 / 11.290 = 0.467

Model sizes

ls -lh sherpa-onnx-nemo-*-giga-am-v2-russian-2025-04-19
sherpa-onnx-nemo-ctc-giga-am-v2-russian-2025-04-19:
total 226M
-rwxr-xr-x 1 501 staff 3.6K Apr 19 15:04 export-onnx-ctc.py
-rwxr-xr-x 1 501 staff 1.7K Apr 19 14:58 export-onnx-ctc-v2.py
-rw-r--r-- 1 501 staff 219K Apr 19 15:04 LICENSE
-rw-r--r-- 1 501 staff 226M Apr 19 15:03 model.int8.onnx
-rwxr-xr-x 1 501 staff  596 Apr 19 14:58 run-ctc-v2.sh
-rwxr-xr-x 1 501 staff 4.1K Apr 19 15:04 test-onnx-ctc.py
drwxr-xr-x 2 501 staff 4.0K Apr 20 01:52 test_wavs
-rw-r--r-- 1 501 staff  196 Apr 19 15:03 tokens.txt

sherpa-onnx-nemo-transducer-giga-am-v2-russian-2025-04-19:
total 231M
-rw-r--r-- 1 501 staff 3.2M Apr 20 01:49 decoder.onnx
-rw-r--r-- 1 501 staff 226M Apr 20 01:49 encoder.int8.onnx
-rw-r--r-- 1 501 staff 1.4M Apr 20 01:49 joiner.onnx
-rw-r--r-- 1 501 staff 219K Apr 20 01:50 LICENSE
-rw-r--r-- 1 501 staff  302 Apr 20 01:50 README.md
-rwxr-xr-x 1 501 staff  868 Apr 20 01:47 run-rnnt-v2.sh
-rwxr-xr-x 1 501 staff 8.9K Apr 20 01:50 test-onnx-rnnt.py
drwxr-xr-x 2 501 staff 4.0K Apr 20 01:53 test_wavs
-rw-r--r-- 1 501 staff  196 Apr 20 01:49 tokens.txt

You can download the models from
https://github.com/k2-fsa/sherpa-onnx/releases/tag/asr-models

Screenshot 2025-04-20 at 10 02 21

You can also try them with the following huggingface space
https://huggingface.co/spaces/k2-fsa/automatic-speech-recognition

@csukuangfj csukuangfj merged commit be0f382 into k2-fsa:master Apr 20, 2025
8 of 34 checks passed
@csukuangfj csukuangfj deleted the fix-giga-am branch April 20, 2025 02:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add support GigaAM-RNNT-v2
1 participant