-
We are the first to identify neural vocoders as a source of features to expose synthetic human voices.
Here are the differences shown by the six vocoders compared to the original audio: -
We provide LibriSeVoC as a dataset of self-vocoding samples created with six state-of-the-art vocoders to highlight and exploit the vocoder artifacts.
The composition of the dataset is shown in the following table:The source of our dataset ground truth comes from LibriTTS. Therefore, we follow the naming logic of LibriTTS.
For example:
27_123349_000006_000000.wav
→27
is the reader's ID123349
is the ID of the chapter
We propose a new approach to detecting synthetic human voices by:
- Exposing signal artifacts left by neural vocoders
- Modifying and improving the RawNet2 baseline by adding multi-loss
✅ This lowers the error rate from 6.10% to 4.54% on the ASVspoof Dataset.
Here is the framework of the proposed synthesized voice detection method:

-
📘 Paper:
AI-Synthesized Voice Detection Using Neural Vocoder Artifacts – CVPRW 2023 -
📦 Dataset:
Download LibriSeVoc
python main.py --data_path /your/path/to/LibriSeVoc/ --model_save_path /your/path/to/models/
python eval.py --input_path /your/path/to/sample.wav --model_path /your/path/to/your_model.pth
Download the trained model weights from the link below:
https://drive.google.com/file/d/15qOi26czvZddIbKP_SOR8SLQFZK8cf8E/view?usp=sharing
You can test audio samples live on our lab's Deepfake O Meter platform:
https://zinc.cse.buffalo.edu/ubmdfl/deep-o-meter/landing_page
This repository is licensed under the MIT License.
You are free to use, modify, and distribute the code with proper attribution.