Skip to content
Discussion options

You must be logged in to vote

Hi @Ca-ressemble-a-du-fake , CTC-segmentation takes care of punctuated text, processes and normalizes it. And makes so that for the alignment, only symbols supported by the model are used, so it shouldn't be an issue. I've tried fr_citrinet on a (the first audio from here)[https://librivox.org/compilation-de-poemes-012-by-various/], and it worked well.

A few questions:

  • How fast is the speech in your audio?
  • Have you checked the processed audio in /home/caraduf/Tests/Nemo/output/processed/ after resampling --cut_prefix=3? does it sound ok?

CTC-segmentation could struggle with start/end if the speaker talks too fast, but it is usually a few milliseconds issue, not a seconds diff.
If you add

Replies: 3 comments 12 replies

Comment options

You must be logged in to vote
10 replies
@titu1994
Comment options

@titu1994
Comment options

@Ca-ressemble-a-du-fake
Comment options

@titu1994
Comment options

@Ca-ressemble-a-du-fake
Comment options

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
2 replies
@Ca-ressemble-a-du-fake
Comment options

@Ca-ressemble-a-du-fake
Comment options

Answer selected by Ca-ressemble-a-du-fake
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants