This is the official code for paper "Leveraging LLM and Text-Queried Separation for Noise-Robust Sound Event Detection".
Paper Link: https://arxiv.org/abs/2411.01174
- Please first
pip install -r requirements.txt
Please follow instructions in DCASE 2024 Challenge Task 9 Baseline for pre-training LASS models: https://github.com/Audio-AGI/dcase2024_task9_baseline
Or Download our pre-trained AudioSep-Dp model, we release our pre-trained AudioSep-DP model at: https://zenodo.org/records/14208090
In our paper, we used DESED, AudioSet-Strong, and WildDESED datasets. Please download dataset from https://project.inria.fr/desed/, https://research.google.com/audioset/, and https://zenodo.org/records/14013803
In our work, we used the pre-trained LASS model to extract sound tracks of different events. Please use separate_audio.py
to perform this procedure.
Without curriculum learning: run python train_pretrained.py
With curriculum learning: run python train_pretrained_cl.py
PS: Please follow instructions in DCASE 2024 Task 4 to extract embeddings through pre-trained model BEATs.