This repo contains the code for "Prioritizing Data Acquisition For End-to-End Speech Model Improvement", accepted at ICASSP 2024.
In this repository, you will find the code to replicate our experiments.
We do not include the datasets used in the paper as they are publicly available and downloadable from the respective authors: FSC for English and ITALIC for Italian. To make it work, you should put data files under data.
Our code was tested on Python 3.11.2. To make it work, you will need:
- a working environment with the libraries listed in requirements.txt;
- a functioning torch installation in the same environment.
Use the ft_main.py
to finetune the required models, inference.py
to evaluate them, and divexplorer_analysis.ipynb
to explore subgroup divergence.
The first table shows the mean and standard deviation of three different runs for the FSC dataset with the wav2vec 2.0 base model. We compare the results for the original fine-tuning procedure, the two baselines (random and clustering-based) and our divergence-aware strategy. Best results for each number of considered subgroups K are highlighted in bold. Best results overall are underlined.
The second table summarizes the results in terms of mean and standard deviation of three different runs for the ITALIC dataset with the XLSR large model. We again compare the original fine-tuning procedure with two baselines (random and clustering-based) and our divergence-aware strategy. The best results for each number of considered subgroups K are highlighted in bold, while the best results overall are underlined.
If you use this code in your research, please cite our paper:
@INPROCEEDINGS{koudounas2024prioritizing,
author={Koudounas, Alkis and Pastor, Eliana and Attanasio, Giuseppe and de Alfaro, Luca and Baralis, Elena},
booktitle={ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
title={Prioritizing Data Acquisition for end-to-end Speech Model Improvement},
year={2024},
volume={},
number={},
pages={7000-7004},
keywords={Training;Costs;Intent recognition;Data acquisition;Signal processing;Data models;Object recognition;spoken language understanding;data acquisition;data markets;divergence},
doi={10.1109/ICASSP48485.2024.10446326}}
This code is released under the Apache 2.0 license. See the LICENSE file for more details.
For any questions, please contact Alkis Koudounas.