Whisper has a tendency to generate hallucinations over periods of silence in recordings. This ticket is to investigate possible options for pre-processing audio to see if that can lead to an improvement in Whisper output.
This recording that's been accessioned in QA could be a good sample since it starts without about 17 minutes of silence and has a few other periods of silence.
This case is distinct from:
- recordings that have no audio track whatsoever
- recordings that have an audio track but are essentially silent (below a certain audible threshhold for the whole recording)
Those two other cases are more tractable and higher priority (see #1436).