Investigate pre-processing options to detect long silences in recordings to be transcribed via Whisper

Whisper has a tendency to generate hallucinations over periods of silence in recordings. This ticket is to investigate possible options for pre-processing audio to see if that can lead to an improvement in Whisper output.

[This recording](https://sul-purl-stage.stanford.edu/fk250vc8974) that's been accessioned in QA could be a good sample since it starts without about 17 minutes of silence and has a few other periods of silence.

This case is distinct from:

- recordings that have no audio track whatsoever
- recordings that have an audio track but are essentially silent (below a certain audible threshhold for the whole recording)

Those two other cases are more tractable and higher priority (see #1436).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Investigate pre-processing options to detect long silences in recordings to be transcribed via Whisper #1435

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Investigate pre-processing options to detect long silences in recordings to be transcribed via Whisper #1435

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions