[HELP NEEDED]: Evaluating different methods of getting Word and Utterance (sentence) level Timestamps from NeMo models #5170
Unanswered
ishansharma1320
asked this question in
Q&A
Replies: 1 comment
-
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
Thanks for reading this message.
I have been recently experimenting with NeMo to extract word-level and utterance-level timestamps for text transcriptions.
Based on information available as tutorials and discussion/PR threads, I was able to gather the following methods:
Word Level Timestamps
Utterance (sentence) Level Timestamps
Based on the above information,
I have the following doubts,
Out of the 3 methods mentioned above, which is more accurate for word-level timestamps that can be implemented for both CTC and RNN-T architectures and are there any other approaches to achieve the same that are more accurate?
For utterance level timestamps, is the approach correct and are there any other approaches to achieve the same that are more accurate?
Any advice/insights regarding this will be highly appreciated.
Thanks Again
Ishan
Beta Was this translation helpful? Give feedback.
All reactions