Model stt_enes_conformer_transducer_large was not found. #4792

ConnieZi · 2022-08-23T01:11:38Z

ConnieZi
Aug 23, 2022

I've been trying to follow ASR_with_SpeakerDiarization.ipynb tutorial to produce transcripts with speaker diarization. My audios consist of a mix of both English and Spanish within them, so I found 5 multilingual models claiming to support transcribing en and es by searching https://catalog.ngc.nvidia.com/models?filters=&orderBy=dateModifiedDESC&query=es, but 4 of them throw me errors.

My code: (it's completely the same as what is in the tutorial, except that I changed the model_path)

pretraind_speaker_model = 'titanet_large'
cfg.diarizer.manifest_filepath = cfg.diarizer.manifest_filepath
cfg.diarizer.out_dir = data_dir     # directory to store intermidiate files and prediction outputs

cfg.diarizer.speaker_embeddings.model_path = pretraind_speaker_model
cfg.diarizer.speaker_embeddings.parameters.window_length_in_sec = 1.5
cfg.diarizer.speaker_embeddings.parameters.shift_length_in_sec = 0.75
cfg.diarizer.clustering.parameters.oracle_num_speakers=True

# Using VAD generated from ASR timestamps
cfg.diarizer.asr.model_path = 'stt_enes_conformer_transducer_large'    # this model is fine-tuned from English to Spanish
cfg.diarizer.oracle_vad = False # ----> Not using oracle VAD 
cfg.diarizer.ignore_overlap = False # ----> We want to consider overlaps
cfg.diarizer.asr.parameters.asr_based_vad = True
cfg.diarizer.asr.parameters.threshold=100 # ASR based VAD threshold: If 100, all silences under 1 sec are ignored.
cfg.diarizer.asr.parameters.decoder_delay_in_sec=0.2 # Decoder delay is compensated for 0.2 sec

from nemo.collections.asr.parts.utils.decoder_timestamps_utils import ASR_TIMESTAMPS

# create a decoder instance that returns an ASR model                                                                  
asr_ts_decoder = ASR_TIMESTAMPS(**cfg.diarizer)
asr_model = asr_ts_decoder.set_asr_model()

# run ASR
word_hyp, word_ts_hyp = asr_ts_decoder.run_ASR(asr_model)

print("Decoded word output dictionary: \n", word_hyp[filnename_prefix])
print("Word-level timestamps dictionary: \n", word_ts_hyp[filnename_prefix])

Error message:
FileNotFoundError: Model stt_enes_conformer_transducer_large was not found. Check cls.list_available_models() for the list of all available models.

In summary:
By simply changing cfg.diarizer.asr.model_path, I tried using model stt_enes_conformer_ctc_large_codesw, stt_enes_conformer_transducer_large_codesw, stt_enes_conformer_transducer_large, and resulted in model not found. Another model stt_enes_contextnet_large threw me error

ConfigKeyError: Missing key self.ASR_model_name
    full_key: diarizer.asr.parameters.self.ASR_model_name
    object_type=dict

Some of the other single language(en or es) models ran successfully in my case. I am not sure if it's relevant to each model's model base class, and this tutorial code is not able to access the models in class EncDecRNNTBPEModel.

tango4j · 2022-11-22T23:27:42Z

tango4j
Nov 22, 2022
Collaborator

We do not support all ASR models for diarization with ASR at this point. Each model needs to be outputting correct timestamps to match with speaker diarization results. We recommend to use "stt_en_conformer_ctc_large" for now.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Model stt_enes_conformer_transducer_large was not found. #4792

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Model stt_enes_conformer_transducer_large was not found. #4792

Uh oh!

ConnieZi Aug 23, 2022

Replies: 1 comment

Uh oh!

tango4j Nov 22, 2022 Collaborator

ConnieZi
Aug 23, 2022

tango4j
Nov 22, 2022
Collaborator