v2 (ctc_align) versus v3 (whisper timestamps)

Thank you for the great project!

In v2 you were using CTC align, in v3 you are using whisper timestamps. 
As far as i know, the timestamps from the CTC_align are more accurate than the whisper timestamps.

What was the reason to use the Whisper timestamps instead of CTC? 

We have more accurate transcripts than Whisper generates, how could i use those directly ?

I tried manually making the v3 speaker representations but based on CTC timestamps and then use those, but the results are not good, would it require retraining / finetuning to use the CTC timestamps ?



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

v2 (ctc_align) versus v3 (whisper timestamps) #93

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

v2 (ctc_align) versus v3 (whisper timestamps) #93

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions