Skip to content

Investigate pre-processing options to detect long silences in recordings to be transcribed via Whisper #1435

@andrewjbtw

Description

@andrewjbtw

Whisper has a tendency to generate hallucinations over periods of silence in recordings. This ticket is to investigate possible options for pre-processing audio to see if that can lead to an improvement in Whisper output.

This recording that's been accessioned in QA could be a good sample since it starts without about 17 minutes of silence and has a few other periods of silence.

This case is distinct from:

  • recordings that have no audio track whatsoever
  • recordings that have an audio track but are essentially silent (below a certain audible threshhold for the whole recording)

Those two other cases are more tractable and higher priority (see #1436).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions