-
-
Notifications
You must be signed in to change notification settings - Fork 112
Open
Description
Thank you for the great project!
In v2 you were using CTC align, in v3 you are using whisper timestamps.
As far as i know, the timestamps from the CTC_align are more accurate than the whisper timestamps.
What was the reason to use the Whisper timestamps instead of CTC?
We have more accurate transcripts than Whisper generates, how could i use those directly ?
I tried manually making the v3 speaker representations but based on CTC timestamps and then use those, but the results are not good, would it require retraining / finetuning to use the CTC timestamps ?
Metadata
Metadata
Assignees
Labels
No labels