Adapt-KWS

Cross-Domain Few-Shot Open-Set Keyword Spotting Using Keyword Adaptation and Prototype Reprojection

International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2025

⭐ Official code of the Adapt-KWS.

🔈Introduction

We provide the scripts for:

Source Domain Pretraining: pretrain the DSCNN on a large-scale source domain dataset (MSWC) to develop a robust feature encoder.
Target Domain Adapting: integrate the pretrained DSCNN and CKAs, where the DSCNN is frozen and the CKAs are optimized from scratch using the support set.
Target Domain Querying: perform the open-set classification by comparing the Euclidean distance between the reprojected prototypes and the query feature.

Overall framework of Adapt-KWS.

⚒️Environment

git clone https://github.com/Raynaming/CD-FSOS-KWS.git
cd CD-FSOS-KWS
conda create -n adapt_kws python=3.7.12
conda activate adapt_kws
pip install -r requirements.txt

💡Data Preparation

Create a new folder termed <dataset_path>, download and process the source and target dataset by following the instructions below. Note that additive noise from the DEMAND dataset is used at training time.

Multilingual Spoken Words Corpus (MSWC)

Simply download and unpack the engish partition inside the <dataset_path>. Audio files will be in <dataset_path>/MSWC/en/clips/
Convert the audio files to .opus to .wav and store to the outputs to <dataset_path>/MSWC/en/clips_wav/. This will fasten the file loads at runtime (no uncompress is needed) at the cost of a higher memory storage. If this step is not done, modify the folder name at line 390 of the MSWCData.py file
Put the split csv files (en_{train,test,dev}.csv) to the <dataset_path>/MSWC/en/ folder
Add the noise folder to sample the noise recordings: <dataset_path>/MSWC/noise/. We used samples from the DEMAND dataset, only copying the wav file with ID=01 of every noise type to the destination folder (the name of the file is the destination folder can be any).

Google Speech Commands (GSC)

The Google Speech Command dataset v2 is unpacked to <dataset_path>/GSC/.
Any link for download can be used (e.g. torchaudio).

Dysarthric speech database for universal access research (UA-Speech)

Send e-mail to uaspeech-requests to obtatining a copy of the UA-Speech dataset.
Unpack the noisereduce versions of UA-Speech dataset to <dataset_path>/UASpeech/ and individual speakers' audio folders will be located in <dataset_path>/UASpeech/audio/noisereduce/

Mandarin Dysarthria Speech Corpus (MDSC)

Download the Mandarin Dysarthria Speech Corpus and unpack it to <dataset_path>/MDSC/.
The audio files for each speaker will be located in <dataset_path>/MDSC/wav/, and the corresponding transcript information can be found in <dataset_path>/MDSC/transcript/.

Finally, the <dataset_path> could be the following format:

dataset_path
├── MSWC
│   ├── en
│   │    ├──en_dev.csv
│   │    ├──en_test.csv
│   │    ├──en_train.csv
│   │    └──clips
│   └── noise
│        └──ch01.wav
├── GSC
│   └── speech_commands_v0.02
├── UASpeech
│   ├── audio
│   ├── doc
│   ├── mlf
│   ├── video
│   └── ...
├── MDSC
│   ├── transcript
│   └── wav

🚀Source Domain Pretraining

To pretrain the backone on the MSWC dataset, you can run:

python source_pretraining.py --data.cuda \
--speech.default_datadir <dataset_path>/MSWC/en/ \
--train.epochs 40  \
--train.n_way 80 \
--train.n_query 20 \
--train.n_episodes 400  \
--log.exp_dir <output_dir>/<EXP_NAME>

Make sure to set: <dataset_path> and <output_dir>/<EXP_NAME>. You can use the command above to generate the trained model provided as an example in results/Pretrain_DSCNN_MSWC.

Main Training Options:

train.n_way. Number of classes for training episodes.
train.n_query. Number of samples per training episodes.
train.n_episodes. Number of episodes for epoch.

📈Target adapting and querying

python target_adapting_querying.py --data.cuda --choose_cuda 0 \
    --model.model_path results/Pretrain_DSCNN_MSWC/best_model.pt \
    --speech.dataset googlespeechcommand --speech.task GSC12,GSC22 \
    --speech.default_datadir <dataset_path>/GSC/speech_commands_v0.02/  \
    --speech.include_unknown \
    --fsl.test.batch_size 264 \
    --fsl.test.n_support 10 \
    --fsl.test.n_way 11 \
    --fsl.test.n_episodes 100 \
    --querying.prototype_reprojection

Main Adapting and Querying Options:

model.model_path: The pretrained model's path. You can replace it with the path to your pretrained model.
speech.dataset: The target dataset used for adapting and querying, including: googlespeechcommand, UASpeech, and MDSC.
speech.task: Set according to speech.dataset as follows:GSC12,GSC22, UASpeech12,UASpeech22, MDSC12,MDSC22.
speech.default_datadir: Set according to speech.dataset as follows:<dataset_path>/GSC/speech_commands_v0.02/, <dataset_path>/UASpeech/, <dataset_path>/MDSC/.

The test results are saved by default to the path specified in model.model_path.

📑Citation

@inproceedings{yang2025cross,
  title={Cross-Domain Few-Shot Open-Set Keyword Spotting Using Keyword Adaptation and Prototype Reprojection},
  author={Yang, Mingru and He, Qianhua and Huang, Jinxin and Chen, Yongqiang and Liu, Zunxian and Li, Yanxiong},
  booktitle={ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={1--5},
  year={2025},
  organization={IEEE}
}

🍵Acknownoledge

This work was partly supported by National Natural Science Foundation of China (62371195) and the Guangdong Science and Technology Foundation (2023A0505050116, 2022A1515011687).

And we acknowledge the following code repositories:

📧Cotact

Mingru Yang: eemryang@mail.scut.edu.cn

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
__pycache__		__pycache__
asset		asset
classifiers		classifiers
data		data
models		models
results/Pretrain_DSCNN_MSWC		results/Pretrain_DSCNN_MSWC
README.md		README.md
log.py		log.py
metrics.py		metrics.py
parser_kws.py		parser_kws.py
requirements.txt		requirements.txt
source_pretraining.py		source_pretraining.py
target_adapting_querying.py		target_adapting_querying.py
train_class_loss.py		train_class_loss.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Adapt-KWS

🔈Introduction

⚒️Environment

💡Data Preparation

Multilingual Spoken Words Corpus (MSWC)

Google Speech Commands (GSC)

Dysarthric speech database for universal access research (UA-Speech)

Mandarin Dysarthria Speech Corpus (MDSC)

🚀Source Domain Pretraining

📈Target adapting and querying

📑Citation

🍵Acknownoledge

📧Cotact

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Raynaming/CD-FSOS-KWS

Folders and files

Latest commit

History

Repository files navigation

Adapt-KWS

🔈Introduction

⚒️Environment

💡Data Preparation

Multilingual Spoken Words Corpus (MSWC)

Google Speech Commands (GSC)

Dysarthric speech database for universal access research (UA-Speech)

Mandarin Dysarthria Speech Corpus (MDSC)

🚀Source Domain Pretraining

📈Target adapting and querying​​

📑Citation

🍵Acknownoledge

📧Cotact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

📈Target adapting and querying

Packages