Interspeech2025-SEP28kWhisperBenchmarking

The python notebook contains the code used for the manual transcription of SEP-28k Clips- inference runs of Whisper 2 and 3 on this audio- the data analysis- and the graphing of results for this work.

There is also an audio folder with all the audio clips from SEP-28k used in this research.

The CSV file contains the results of the project- and for each clip- it shows

stutter_present: Binary label for whether the audio clip contains a stuttering event
manual_prolongation: Binary label for whether the audio clip contains a Prolongation
manual_block: Binary label for whether the audio clip contains a Block
manual_soundRep: Binary label for whether the audio clip contains a Sound Repetition
manual_wordRep: Binary label for whether the audio clip contains a Word Repetition
manual_interject: Binary label for whether the audio clip contains an Interjection
hallucination_binary: Binary label for whether the audio clip contains a Hallucination
manual_transcription_semantic: Transcription of audio clip with stuttering events removed
manual_transcription_literal: Transcription of audio clip with stuttering events included
Whisper2Annotation: Transcription of audio clip created by Whisper 2
Whisper3Annotation: Transcription of audio clip created by Whisper 3
audio_clip_name: (show name)_(episode id)_(clip id)

Please cite the following paper when publishing work related to this dataset:

J-j-j-just Stutter: Benchmarking Whisper’s Performance Disparities on Different Stuttering Patterns. Charan Sridhar, Shaomei Wu. Accepted to appear in Proceedings of the InterSpeech Conference. 2025. (Preprint)

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
curated_audio_clips		curated_audio_clips
LICENSE		LICENSE
README.md		README.md
SEP28k_processing_inference_analysis.ipynb		SEP28k_processing_inference_analysis.ipynb
benchmark_dataset.csv		benchmark_dataset.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Interspeech2025-SEP28kWhisperBenchmarking

Please cite the following paper when publishing work related to this dataset:

About

Uh oh!

Releases 1

Packages

Contributors 2

Uh oh!

Languages

License

aimpowered/stuttered-speech-benchmark

Folders and files

Latest commit

History

Repository files navigation

Interspeech2025-SEP28kWhisperBenchmarking

Please cite the following paper when publishing work related to this dataset:

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Uh oh!

Languages

Packages