Skip to content

The problem of differences between text and speech content in the English dataset in the paper #13

@CriDora

Description

@CriDora

Hello, when you are training with VoiceAssistant-400K data, do you use the discrete snac tokens provided by it to synthesize speech? When I use snac to synthesize speech, I find that the snac tokens provided by VoiceAssistant-400K are not complete, and the synthesized speech content is shorter than the answer text. Have you done some speech-text alignment processing during training?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions