Pitch Accent Detection improves Pretrained Automatic Speech Recognition

This repository contains the code accompanying our INTERSPEECH 2025 paper:

Title: Enhancing Robot Instruction Understanding and Disambiguation via Speech Prosody
Authors: David Sasu and Natalie Schluter
Affiliation: IT University of Copenhagen

Abstract

We show the performance of Automatic Speech Recognition (ASR) systems that use semi-supervised speech representations can be boosted by a complimentary pitch accent detection module, by introducing a joint ASR and pitch accent detection model. The pitch accent detection component of our model achieves a significant improvement on the state-of-the-art for the task, closing the gap in F1-score by 41%. Additionally, the ASR performance in joint training decreases WER by 28.3% on LibriSpeech, under limited resource fine-tuning. With these results, we show the importance of extending pretrained speech models to retain or re-learn important prosodic cues such as pitch accent.

Contributions

Significant enhancement in pitch accent detection performance, surpassing previous state-of-the-art.
A joint ASR and pitch accent detection model that improves ASR performance in limited resource scenarios.
A semi-supervised self-training approach to augment pitch accent annotations, further enhancing ASR performance.

Datasets

The experiments leverage the following datasets:

Boston University Radio News Corpus (BURNC)
LibriSpeech
Libri-light

Model Architecture

Our joint Prosody-ASR model:

Employs pretrained wav2vec 2.0 as the base speech representation model.
Adds an extra linear layer and layer normalization for improved pitch accent classification.

Repository Structure

. 
├── model_script.py/        # Code for joint model
├── train_script.py/        # Code for training and evaluating joint model
├── LICENSE
└── README.md

Citation

If you use this code, please cite our paper:

@inproceedings{sasu2025pitch,
  title={Pitch Accent Detection Improves Pretrained Automatic Speech Recognition},
  author={Sasu, David and Schluter, Natalie},
  booktitle={Proceedings of Interspeech 2025},
  year={2025}
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
LICENSE.txt		LICENSE.txt
README.md		README.md
model_script.py		model_script.py
train_script.py		train_script.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Pitch Accent Detection improves Pretrained Automatic Speech Recognition

Abstract

Contributions

Datasets

Model Architecture

Repository Structure

Citation

About

Uh oh!

Releases 1

Packages

Languages

License

sasudavid/prosodic_prominence_asr

Folders and files

Latest commit

History

Repository files navigation

Pitch Accent Detection improves Pretrained Automatic Speech Recognition

Abstract

Contributions

Datasets

Model Architecture

Repository Structure

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages