Efficient-Detection is a deepfake video detection system designed to utilize transfer learning with EfficientNet models, making it suitable for deployment on systems with limited computational resources. By leveraging pre-trained EfficientNet models, Efficient-Detection achieves moderate accuracy in detecting manipulated video content while keeping computational requirements low.
Features:
- Diverse Dataset: Utilizes a diverse dataset with balanced classes to ensure robust model performance across various deepfake techniques.
- Random Frame Selection: Employs a strategy of randomly selecting frames from videos to enhance model generalization and capture inconsistencies in manipulated content.
- Pre-Trained Models: Leverages pre-trained EfficientNet models to reduce training time and computational requirements.
- Efficient Data Types: Uses memory-saving data types like NumPy arrays for training and preprocessing.
In the next sections, we will explain the process from uploading a video to training and inference.
Ensure you have Python installed and set up a virtual environment (recommended):
python -m venv venv
source venv/bin/activate
source venv\Scripts\activate
pip install -r requirements.txt
Ensure you have Node.js installed, then navigate to the frontend
directory and install dependencies:
cd frontend
npm install
FrameExtractor
is a Python class designed to extract frames from video files, which can be useful for deepfake detection using transfer learning.
This script processes videos from a specified directory, extracts frames at each timestep, and saves them in an organized manner for further analysis.
- Supports multiple video formats:
.mp4
,.mov
,.mwv
,.mkv
,.avi
- Extracts frames from single or multiple videos in a directory
- Organizes frames into separate folders for each video
- Allows automatic execution upon initialization
- Ensures proper directory structure for frame storage
from frame_extractor import FrameExtractor
# Initialize with default paths
frame_exec = FrameExtractor(video_path='./videos', frame_path='./frames', create_frame_folder=True, auto_start=True)
video_path
: Path to the video file or directory containing videos.frame_path
: Path where extracted frames will be saved.create_frame_folder
: IfTrue
, clears and recreates the frame folder.auto_start
: IfTrue
, starts extracting frames upon initialization.
/videos # Input video files
/frames # Extracted frames
/video1 # Frames from video1.mp4
frame_1.jpg
frame_2.jpg
/video2 # Frames from video2.mov
frame_1.jpg
frame_2.jpg
FaceProcessor
is a Python-based tool designed to process video frames, detect and extract faces, and optionally apply data augmentation. It utilizes OpenCV and Dlib for face detection and landmark recognition, making it suitable for tasks such as deepfake detection, facial recognition, and machine learning preprocessing.
- Extracts frames from videos and detects faces.
- Uses OpenCV's Haar Cascade for face detection.
- Utilizes Dlib's 68-face landmark predictor for precise face localization.
- Supports data augmentation for training deep learning models.
- Allows frame shuffling and sequential ordering.
- Configurable face detection parameters.
from face_processor import FaceProcessor
face_processor = FaceProcessor(frames_folder="./frames", count_of_frames=50, img_size=(299, 299), shuffle=True, is_sequential_frames=False)
frames_folder
: Path to frames directorycount_of_frames
: Number of frames to process (-1 for all)img_size
: Target size of extracted facesshuffle
: Shuffle frames before processingis_sequential_frames
: Maintain sequential frame order
- If no face is detected in a frame, it will be skipped with a warning.
- If the landmark model file is missing, an error will be raised.
This module classifies faces using an EfficientNet-based model with optional LSTM layers for sequential learning.
from classifier import Classifier
classifier = Classifier(input_shape=(229, 229, 3), selected_model=3)
classifier.train(training_data, validation_data, epochs=5)
loss, accuracy = classifier.evaluate(X_test, y_test)
input_shape
: Input image shape.learning_rate
: Learning rate for training.retrain
: Load a pre-trained model ifTrue
.save_model
: Save trained model ifTrue
.trained_model
: Path to pre-trained model.selected_model
: Choose EfficientNetB1 (1) or EfficientNetB3 (3).is_sequential_training
: Enable sequential training with LSTM layers.
build_model()
: Builds and compiles the model.train(training_data, validation_data, batch_size, epochs)
: Trains the model.evaluate(X_test, y_test)
: Evaluates model performance.save()
: Saves the trained model.
To train the classifier:
classifier.train(training_data, validation_data, epochs=10)
To evaluate:
test_loss, test_acc = classifier.evaluate(X_test, y_test)
To save a model after training:
classifier.save()
classifier = Classifier(retrain=True, trained_model="./models/model.keras") # Replace it with a correct model name
- Ensure
shape_predictor_68_face_landmarks.dat
is available in the correct path for Dlib. - EfficientNet weights are downloaded automatically from TensorFlow.
- Sequential training with LSTM layers is available for temporal modeling.
- Unrar
models.rar
inside a folder namedmodels
. Downloadmodels.rar
from here
To extract frames, process faces, and train the classifier using command-line arguments:
python main.py --real_video_path "./videos/real" \
--fake_video_path "../videos/fake" \
--frame_path "./frames" \
--count_of_frames -1 \
--create_frame_folder 1 \
--shuffle_frames 0 \
--is_sequential_frames 0 \
--implement_augmentation 0 \
--learning_rate 0.001 \
--batch_size 1 \
--epochs 50 \
--save_model 1 \
--img_width 229 \
--img_height 299
The data used for training comes from the Celeb DF Dataset
. Since we are performing transfer learning with limited resources, I have used a minimal amount of data for experimentation. Despite this, I achieved moderate results using approximately 8-9K frames extracted from random videos and shuffled.
- Threshold 0.4: Accuracy: 0.8200, Precision: 0.7759, Recall: 0.9000, F1 Score: 0.8333
- Threshold 0.45: Accuracy: 0.8200, Precision: 0.82, Recall: 0.82, F1 Score: 0.82
- Threshold 0.45: Accuracy: 0.8500, Precision: 0.8571, Recall: 0.8400, F1 Score: 0.8485
- Threshold 0.4: Accuracy: 0.8500, Precision: 0.8182, Recall: 0.9000, F1 Score: 0.8571
- Threshold 0.4: Accuracy: 0.800, Precision: 0.7843, Recall: 0.8000, F1 Score: 0.7921
- Threshold 0.45: Accuracy: 0.7800, Precision: 0.7917, Recall: 0.7600, F1 Score: 0.7755
- Threshold 0.45: Accuracy: 0.8100, Precision: 0.8039, Recall: 0.8200, F1 Score: 0.8119
- Threshold 0.4: Accuracy: 0.8000, Precision: 0.7778, Recall: 0.8400, F1 Score: 0.8077
I have also created a simple Web Dev app using Express-NodeJS
, Flask-Python
and HTML\CSS\JS
to provide better use of the deepfake detection system
To run the client-side run:
cd ./frontend
npm run server.js
To run the server-side run:
python -m backend.app
- Make the user able to choose number of frames for detection.
- Make the user able to choose different models for detection.
- Make a better UI with a Front-End framework.
- Remove videos from
video
folder after detection to save space - Remove frames from
frames
folder after detection to save space - Implement continues learning to enhance model performance and use new datasets with new Deepfake techniques