Skip to content

Code and Datasets to reproduce the results of the paper ``MalCL: Leveraging GAN-Based Generative Replay to Combat Catastrophic Forgetting in Malware Classification", accepted to be published at the The 39th Annual AAAI Conference on Artificial Intelligence (AAAI) 2025.

License

Notifications You must be signed in to change notification settings

MalwareReplayGAN/MalCL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

MalCL: Leveraging GAN-Based Generative Replay to Combat Catastrophic Forgetting in Malware Classification


This repository contains code of the paper MalCL: Leveraging GAN-Based Generative Replay to Combat Catastrophic Forgetting in Malware Classification

Dataset

The dataset for the experiments can be downloaded from here and Zenodo Repository


  • EMBER 2018 dataset
    We use the 2018 EMBER dataset, known for its challenging classification tasks, focusing on a subset of 337,035 malicious Windows PE files labeled by the top 100 malware families, each with over 400 samples. Features include file size, PE and COFF header details, DLL characteristics, imported and exported functions, and properties like size and entropy, all computed using the feature hashing trick.
  • AZ-Class
    The AZ-Class dataset contains 285,582 samples from 100 Android malware families, each with at least 200 samples. We extracted Drebin features (Arp et al.2014) from the apps, covering eight categories like hardware access, permissions, API calls, and network addresses.

Environment


  • pytorch version 2.0.1
  • conda version 4.7.12
  • python version 3.8.13
  • NVIDIA RTX A6000
  • CUDA version 11.4

Experiment

  1. command lines are follows:

CUDA_VISIBLE_DEVICES=0 python main.py


  1. You can change the Hyper Parameters and options by command line:

You can see the changable options on MalCL_torch/arguments.py

if you want to set the sample selection to L1_B_Mean, the command line is

CUDA_VISIBLE_DEVICES=0 python main.py --sample_select L2_B_Mean

Pipeline


pipeline

Architecture


  • Generator

Generator

  • Discriminator

Discriminator

  • Classifier

Classifier

Results


Table

  • Comparisons to Baseline and Prior Replay Models Using Ember Dataset. We report the mean accuracy scores (Mean) and minimum (Min) computed from every 11 tasks.

About

Code and Datasets to reproduce the results of the paper ``MalCL: Leveraging GAN-Based Generative Replay to Combat Catastrophic Forgetting in Malware Classification", accepted to be published at the The 39th Annual AAAI Conference on Artificial Intelligence (AAAI) 2025.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages