Note
Accepted at Advanced Concepts for Intelligent Vision Systems (ACIVS 2025)
Post-print version is avaliable HERE.
This project introduces a novel dataset and method for evaluating Welsh language fluency using multimodal fusion techniques.
Welsh is a linguistically rich yet under-resourced minority language. Despite its cultural significance, automated fluency assessment remains largely unexplored due to limited datasets and tools. Existing models focus on high-resource languages, leaving Welsh without sufficient multi-modal resources. To address this, we introduce CymruFluency, the first 4D dataset for Welsh fluency assessment, capturing both audio and 3D lip movements with expert-annotated fluency scores. Building on this, we propose a multi-modal fluency classification framework that combines audio features (mel spectrograms) and manually annotated 3D lip landmarks. Our fusion approach significantly improves fluency prediction over unimodal models, emphasizing the critical role of 3D lip dynamics in Welsh learning. This research advances minority language processing by integrating articulatory features into fluency evaluation, offering a powerful tool for Welsh language learning, assessment, and preservation.
![]() Fluent Speaker |
![]() Non-Fluent Speaker |
CymruFluency.V10.mp4
Subject uttering Welsh phrase “Gwybodaeth angenrheidiol” (Tr. EN: Necessary information; IPA: /ˈɡʊɨ̯bɔðaɪθ aŋɛnˈhreɪ̯djɔl/)
3D mesh quality and landmarking in progress.
Aligning landmarks to mitigate head movement.
Full dataset is available on Zenodo.
The dataset is split in four parts and can be accessed through the four versions of the repository. For more information on content and structure of the dataset, please read dataset description.
-
Clone this repo:
git clone https://github.com/arvinsingh/CymruFluency.git cd CymruFluency
-
Install dependencies:
uv sync
-
Launch the notebooks:
jupyter notebook
Data Exploration and Analysis.ipynb
- Visualize and explore dataset statsExperiment Audio Landmarks.ipynb
- Train and eval unimodal modelsExperiment Model Training.ipynb
- Train and eval multimodal modelsWelsh vs English.ipynb
- Comparative study of fluency in Welsh vs English dataset
Architecture Pipeline.
This dataset is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Research purposes only.
Important
If you use our dataset and code, please use the following two bibtex for citation:
@inproceedings{bali_2025_cymrufluency,
author = {Arvinder Pal Singh Bali and
Gary K. L. Tam and
Avishek Siris and
Gareth Andrews and
Yukun Lai and
Bernie Tiddeman and
Gwenno Ffrancon},
title = {CymruFluency - A fusion technique and a 4D Welsh dataset for Welsh fluency analysis},
booktitle = {Advanced Concepts for Intelligent Vision Systems (ACIVS)},
year = {2025},
address = {Japan},
publisher = {Springer (Lecture Notes in Computer Science, LNCS)},
doi = {TBD},
url = {TBD}
}
@dataset{bali_2025_dataset,
author = {Bali, Arvinder Pal Singh and
Tam, Gary KL and
Siris, Avishek and
Andrews, Gareth and
Lai, Yukun and
Tiddeman, Bernie and
Ffrancon, Gwenno},
title = {Dataset and code for "CymruFluency - A fusion technique and a 4D Welsh dataset for Welsh fluency analysis"},
month = may,
year = 2025,
publisher = {Zenodo},
doi = {10.5281/zenodo.15397513},
url = {https://doi.org/10.5281/zenodo.15397513},
}
This research was supported by Coleg Cymraeg Cenedlaethol Small Grant 2017, Cherish-DE Escalator Fund 2019, 2021(1RR, 52E), Swansea University SPIN fund, Wales Network Innovation Small Grant 2023 and EPSRC IAA Fund 2024. We would like to thank all annotators and anonymized participants for their contributions to this project.