Authors:
- Zaid Sheikh (Carnegie Mellon University, USA)
- Shuichiro Shimizu (Kyoto University, Japan)
- Siddhant Arora (Carnegie Mellon University, USA)
- Jiatong Shi (Carnegie Mellon University, USA)
- Samuele Cornell (Carnegie Mellon University, USA)
- Xinjian Li (Carnegie Mellon University, USA)
- Shinji Watanabe (Carnegie Mellon University, USA)
This paper introduces the Scalable Spontaneous Speech Dataset (SSSD) project, comprising 727 hours of spontaneous English conversations between two randomly-matched, anonymous participants on Amazon Mechanical Turk (MTurk) crowd-sourcing platform. The dataset features conversations averaging 25-30 minutes, covering a wide range of everyday topics. A key innovation of this work is our approach to maximizing the number of MTurk workers concurrently participating in our task, enabling more effective randomized matching and live two-person conversations. Data quality is ensured through a two-tiered task structure: a qualification round to select reliable workers, followed by the main recording sessions. We detail our methodology for collecting and recording spontaneous voice conversations, present analyses of the conversational content and speech quality of the dataset in comparison to other datasets, and discuss potential usage.1
Footnotes
-
This website template was adapted from eliahuhorwitz/Academic-project-page-templat ↩