Skip to content

v2 (ctc_align) versus v3 (whisper timestamps) #93

@joazoa

Description

@joazoa

Thank you for the great project!

In v2 you were using CTC align, in v3 you are using whisper timestamps.
As far as i know, the timestamps from the CTC_align are more accurate than the whisper timestamps.

What was the reason to use the Whisper timestamps instead of CTC?

We have more accurate transcripts than Whisper generates, how could i use those directly ?

I tried manually making the v3 speaker representations but based on CTC timestamps and then use those, but the results are not good, would it require retraining / finetuning to use the CTC timestamps ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions