π Paper / Report
This research presents a generalized deepfake detection approach combining CLIP (Vision-Language Model) for semantic feature extraction and ResNet101 (CNN) for hierarchical image analysis. Our domain-independent dataset and feature fusion methodology enhance robustness against diverse deepfake techniques. Achieving 91.24% accuracy, 0.0391 Detection Cost Function (DCF), and 8.73% Equal Error Rate (EER), our model demonstrates adaptability and efficacy in an evolving media landscape.