GitHub - MalwareReplayGAN/MalCL: Code and Datasets to reproduce the results of the paper ``MalCL: Leveraging GAN-Based Generative Replay to Combat Catastrophic Forgetting in Malware Classification", accepted to be published at the The 39th Annual AAAI Conference on Artificial Intelligence (AAAI) 2025.

MalCL: Leveraging GAN-Based Generative Replay to Combat Catastrophic Forgetting in Malware Classification

This repository contains code of the paper MalCL: Leveraging GAN-Based Generative Replay to Combat Catastrophic Forgetting in Malware Classification

Dataset

The dataset for the experiments can be downloaded from here and Zenodo Repository

EMBER 2018 dataset
We use the 2018 EMBER dataset, known for its challenging classification tasks, focusing on a subset of 337,035 malicious Windows PE files labeled by the top 100 malware families, each with over 400 samples. Features include file size, PE and COFF header details, DLL characteristics, imported and exported functions, and properties like size and entropy, all computed using the feature hashing trick.
AZ-Class
The AZ-Class dataset contains 285,582 samples from 100 Android malware families, each with at least 200 samples. We extracted Drebin features (Arp et al.2014) from the apps, covering eight categories like hardware access, permissions, API calls, and network addresses.

Environment

pytorch version 2.0.1
conda version 4.7.12
python version 3.8.13
NVIDIA RTX A6000
CUDA version 11.4

Experiment

command lines are follows:

CUDA_VISIBLE_DEVICES=0 python main.py

You can change the Hyper Parameters and options by command line:

You can see the changable options on MalCL_torch/arguments.py

if you want to set the sample selection to L1_B_Mean, the command line is

CUDA_VISIBLE_DEVICES=0 python main.py --sample_select L2_B_Mean

Pipeline

Architecture

Generator

Discriminator

Classifier

Results

Comparisons to Baseline and Prior Replay Models Using Ember Dataset. We report the mean accuracy scores (Mean) and minimum (Min) computed from every 11 tasks.

Name		Name	Last commit message	Last commit date
Latest commit History 285 Commits
MalCL_torch		MalCL_torch
Repo_img		Repo_img
LICENSE		LICENSE
Readme.md		Readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MalCL: Leveraging GAN-Based Generative Replay to Combat Catastrophic Forgetting in Malware Classification

Dataset

Environment

Experiment

Pipeline

Architecture

Results

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

License

MalwareReplayGAN/MalCL

Folders and files

Latest commit

History

Repository files navigation

MalCL: Leveraging GAN-Based Generative Replay to Combat Catastrophic Forgetting in Malware Classification

Dataset

Environment

Experiment

Pipeline

Architecture

Results

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages