This repository contains the implementation of an Audio Deepfake Detection model using XGBoost Classifier on extracted audio features. The project aims to detect AI-generated human speech by leveraging spectral and temporal audio characteristics.
1️⃣ Convert FLAC to WAV
- The dataset is in FLAC format, so we first convert it to WAV for easier processing.
2️⃣ Preprocess the Audio
- Extract important features like , spectrograms to understand the sound patterns.
3️⃣ Train the Machine Learning Model
- Use models like Transformers, XGBoost to learn differences between real and fake audio.
4️⃣ Evaluate the Model
- Test the trained model with new audio files and measure accuracy using confusion matrix, precision, and recall.
5️⃣ Predict and Detect Deepfakes
- The model analyzes a new audio file and predicts whether it is real or AI-generated (fake).
The dataset used in this project is DEEP-VOICE, which consists of AI-generated and real human speech samples. Additional datasets were referenced from:
-
flac_D of 6.6gb
1️⃣ Handling Unlabeled Data – Since flac_D lacked labels, we analyzed metadata, extracted features, and used clustering techniques to separate real and fake audio samples.
2️⃣ Preprocessing & Labeling – Some files were manually inspected, while feature-based classification helped in labeling data for effective model training.
Audio_Deepfake_Detection/
│── data/ # Dataset storage
│ ├── flac_D/ # Raw FLAC files
│ ├── wav_D_random10k/ # Converted WAV files
│ ├── features.csv # Extracted audio features (from Excel file)
│ ├── mel_spectrograms.npy # Precomputed spectrograms
│
│
├── xgboost_deepfake_detector.pkl # Trained XGBoost model
│
├── Audio Deepfake Detection Take.ipynb # Main notebook for research & implementation
│
│── requirements.txt # Dependencies list
First, download the project from GitHub:
git clone https://github.com/A-A-D-I-C-O-D-E/Audio-Deepfake-Detection.git
cd Audio-Deepfake-Detection
To avoid conflicts, create and activate a virtual environment:
python -m venv venv
source venv/bin/activate # On macOS/Linux
venv\Scripts\activate # On Windows
Make sure you have all necessary libraries installed:
pip install -r requirements.txt
Start Jupyter Notebook to train and test the deepfake detection model:
jupyter notebook
Then, open the notebook file and follow the steps inside.