This project focuses on enhancing malware classification performance and robustness using Genetic Algorithms (GA). The aim is to simulate adversarial attacks against a machine learning-based malware classifier and then retrain the model with generated adversarial samples to enhance its resistance to such attacks.
Modern malware detection systems are increasingly vulnerable to adversarial manipulations. In this project:
- A dataset of static malware features is used for binary classification (malware vs. benign).
- A classifier is trained on the extracted features.
- A Genetic Algorithm is used to craft adversarial examples — modified input samples that aim to mislead the classifier.
- The model is retrained with successful adversarial samples to improve robustness.
- ✅ Feature-based malware classification
- ✅ Adversarial sample generation using Genetic Algorithms
- ✅ Evaluation using Adversarial Success Rate (ASR)
- ✅ Robustness improvement via adversarial training
- ✅ Modular and reusable implementation for experimentation
This project uses the EMBER 2018 dataset, in its tabular (CSV) form available on Kaggle.
-
📌 Original Dataset:
EMBER 2018 by Elastic -
📊 Tabular CSV Version (used in this project):
Tabular EMBER Dataset on Kaggle
Provided by Edir Garcia Lazo
Important
📄 For detailed information about the project, please refer to the conference paper associated with this work.