Explainable Malware Detection using Graph Neural Networks

This project presents a novel approach to malware detection and classification by leveraging Graph Neural Networks (GNNs). The model analyzes the API function call graphs of software to classify it as either malicious or benign. This technique is effective for both Android (APKs) and Windows (PE files) and is designed to be resilient against common obfuscation techniques.

Overview

Traditional signature-based malware detection methods can be easily bypassed by slightly modifying the malware's code. This project overcomes that limitation by focusing on the underlying behavior of the software. By representing an application as a graph of its API calls, the GNN model learns to identify suspicious patterns and relationships that are characteristic of malicious activity.

Key Features

Graph-Based Detection: Each application is converted into a function call graph, allowing the GNN to analyze its structure and control flow to identify malicious patterns.

Obfuscation Resilience: By focusing on core API call patterns rather than specific code signatures, the system is highly robust against evasive malware attacks and common obfuscation techniques.

Model Explainability: The project implements edge pruning on the call graph to identify and rank the most critical API calls that contribute to a malware classification. This provides valuable insights into the model's decision-making process, making it more transparent and trustworthy.

Cross-Platform: The methodology is applicable to both Android (APKs) and Windows Portable Executable (PE) files.

Technologies Used

Core Framework: PyTorch

Model Architecture: Graph Neural Networks (GNNs)

Embeddings: Skip-gram for learning representations of API calls.

How It Works

Graph Extraction: The first step is to statically analyze the executable (APK or PE file) and extract its function call graph. Nodes in the graph represent functions, and edges represent calls between them.

Graph Representation: The extracted graph is then processed and converted into a format suitable for the GNN.

Model Training: The GNN model is trained on a labeled dataset of benign and malicious software samples to learn the patterns associated with malware.

Classification and Explanation: Once trained, the model can classify new, unseen software and use techniques like edge pruning to explain its predictions.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
aftermids		aftermids
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Explainable Malware Detection using Graph Neural Networks

Overview

Key Features

Technologies Used

How It Works

About

Uh oh!

Releases

Packages

Uh oh!

Languages

hselihkin/Malware-Project

Folders and files

Latest commit

History

Repository files navigation

Explainable Malware Detection using Graph Neural Networks

Overview

Key Features

Technologies Used

How It Works

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages