Replies: 1 comment
-
We do not support all ASR models for diarization with ASR at this point. Each model needs to be outputting correct timestamps to match with speaker diarization results. We recommend to use "stt_en_conformer_ctc_large" for now. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I've been trying to follow
ASR_with_SpeakerDiarization.ipynb
tutorial to produce transcripts with speaker diarization. My audios consist of a mix of both English and Spanish within them, so I found 5 multilingual models claiming to support transcribing en and es by searchinghttps://catalog.ngc.nvidia.com/models?filters=&orderBy=dateModifiedDESC&query=es
, but 4 of them throw me errors.My code: (it's completely the same as what is in the tutorial, except that I changed the
model_path
)Error message:
FileNotFoundError: Model
stt_enes_conformer_transducer_large
was not found. Check cls.list_available_models() for the list of all available models.In summary:
By simply changing
cfg.diarizer.asr.model_path
, I tried using modelstt_enes_conformer_ctc_large_codesw
,stt_enes_conformer_transducer_large_codesw
,stt_enes_conformer_transducer_large
, and resulted in model not found. Another modelstt_enes_contextnet_large
threw me errorSome of the other single language(en or es) models ran successfully in my case. I am not sure if it's relevant to each model's model base class, and this tutorial code is not able to access the models in class
EncDecRNNTBPEModel
.Beta Was this translation helpful? Give feedback.
All reactions