Updates

ESDD Challenge 2026

Based on EnvSDD, we are launching the Environmental Sound Deepfake Detection (ESDD) Challenge.

The challenge will be held with ICASSP 2026, baseline is also this repo, how to run challenge baseline please refer to README_ESDD_2026.md.

EnvSDD

Official code for EnvSDD (Environmental Sound Deepfake Detection)

Arxiv: https://arxiv.org/abs/2505.19203

Abstact: Audio generation systems now create very realistic soundscapes that can enhance media production, but also pose potential risks. Several studies have examined deepfakes in speech or singing voice. However, environmental sounds have different characteristics, which may make methods for detecting speech and singing deepfakes less effective for real-world sounds. In addition, existing datasets for environmental sound deepfake detection are limited in scale and audio types. To address this gap, we introduce EnvSDD, the first large-scale curated dataset designed for this task, consisting of 45.25 hours of real and 316.74 hours of fake audio. The test set includes diverse conditions to evaluate the generalizability, such as unseen generation models and unseen datasets. We also propose an audio deepfake detection system, based on a pre-trained audio foundation model. Results on EnvSDD show that our proposed system outperforms the state-of-the-art systems from speech and singing domains.

More information please refer to our demo page: https://envsdd.github.io/

Dataset

Detailed structure of the dataset is shown in the following figure:

EnvSDD-Development: you can download from https://zenodo.org/records/15220951
EnvSDD-Test: you can download from https://zenodo.org/records/15241138
EnvSDD-Remain: available soon
If you download the EnvSDD-Development data, you will get 178765 clips in total, i.e., 35753 (real) + 35753 * 4 (fake).
If you download the EnvSDD-Test data, you will get 39768 clips in total, i.e., 4971 (real) + 4971 * 7 (fake).
To better explain how these numbers come, please refer to the picture below. Clips in red boxes belong to EnvSDD-Development, while those in green boxes belong to EnvSDD-Test.

In Table 3, for real data, the same number refer to the same data, so we only count once for those.
datasplit_dev.csv shows the detailed metadata for Real Data of the development set, including filename, scene label, event label, caption, etc.

Some parts of the dataset are temporarily not publicly available because we plan to host a challenge. We aim to ensure fairness and prevent data leakage prior to the event. The dataset will be made publicly available after the competition concludes. If you are interested in early access for research purposes or have any questions, please feel free to contact us at yinhan@mail.nwpu.edu.cn. Thank you for your understanding!

Development

Step 1: prepare environment by running: pip install -r requirements.txt
Step 2: prepare .json file for development by running: python generate_json_dev.py

Step 3: train your deepfake models by running: python main.py --exp_id 0 --model model_name

3 models are supported now: aasist, w2v2_aasist, beats_aasist.

PS: There are lots of arguments (eg. batchsize, eval ...) in the main.py, you can directly set in the terminal. It is ok if you do not have test.json during training, test.json will only be used when you activate "eval".

Pretrained models: we use two pre-trained models in the paper. We sincerely appreciate the tremendous efforts behind these outstanding works.

wav2vec XLS-R 300M: https://github.com/facebookresearch/fairseq/blob/main/examples/wav2vec/xlsr/README.md
BEATs: https://huggingface.co/nsivaku/nithin_checkpoints/tree/main

Test

Step 1: prepare .json file for test by running: python generate_json_test.py
Step 2: test your deepfake models by running: python main.py --exp_id 0 --eval --eval_output ./eval_output/predictions.txt --test_meta_json tta/test/test01.json --model aasist --model_path /home/yinhan/codes/audio_deepfake/exps/TTA/aasist.pth

PS: we release our checkpoints in https://zenodo.org/records/15480032. At the moment we do not release the metadata of the test sets, so we report the models' performance on the validation set for reference (The performance on validation set is good because this is an in-domain inference, more results on out-of-domain tests can be found in our paper).

Table: Equal Error Rate (%) of different systems on the validation set.

System	TTA	ATA
AASIST	0.80	0.19
W2V2-AASIST	0.27	0.25
BEATs-AASIST	0.09	0.06

Acknowledgements

Our implementations use the source code from the following repositories and users:

Citation

If you find our repository valuable for your work, please consider giving a star to this repo and citing our paper:

@article{envsdd,
  title={{EnvSDD}: Benchmarking Environmental Sound Deepfake Detection},
  author={Yin, Han and Xiao, Yang and Das, Rohan Kumar and Bai, Jisheng and Liu, Haohe and Wang, Wenwu and Plumbley, Mark D},
  booktitle={Interspeech},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
eval_output		eval_output
fairseq_dir @ ecbf110		fairseq_dir @ ecbf110
figs		figs
metadata		metadata
networks		networks
submissions		submissions
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
README_ESDD_2026.md		README_ESDD_2026.md
config.py		config.py
config_v2.py		config_v2.py
data_utils.py		data_utils.py
datasplit_dev.csv		datasplit_dev.csv
eval_metrics.py		eval_metrics.py
generate_json_dev.py		generate_json_dev.py
generate_json_test.py		generate_json_test.py
get_jsons.py		get_jsons.py
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Updates

ESDD Challenge 2026

EnvSDD

Dataset

Development

Test

Acknowledgements

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

apple-yinhan/EnvSDD

Folders and files

Latest commit

History

Repository files navigation

Updates

ESDD Challenge 2026

EnvSDD

Dataset

Development

Test

Acknowledgements

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages