Skip to content

[Question] How to introduce VAD to solve the problem of hallucinations #54

@GuuuWei

Description

@GuuuWei

Background:
I've noticed that when processing audio files containing silent or non-speech segments, Whisper tends to generate hallucinatory content. This not only affects the segments with silence or non-human voices but also seems to impact the subsequent normal speech parts in the audio.

Inquiry:
Given that this is an inherent issue with Whisper, I am curious to know if it's feasible to incorporate strategies similar to VAD in Whisper-turbo. I am aware of approaches like those used in projects such as WhisperX, which seem to effectively mitigate such issues.

Thank you for your time and the incredible work on this project.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions