Skip to content

hselihkin/Malware-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

Explainable Malware Detection using Graph Neural Networks

This project presents a novel approach to malware detection and classification by leveraging Graph Neural Networks (GNNs). The model analyzes the API function call graphs of software to classify it as either malicious or benign. This technique is effective for both Android (APKs) and Windows (PE files) and is designed to be resilient against common obfuscation techniques.

Overview

Traditional signature-based malware detection methods can be easily bypassed by slightly modifying the malware's code. This project overcomes that limitation by focusing on the underlying behavior of the software. By representing an application as a graph of its API calls, the GNN model learns to identify suspicious patterns and relationships that are characteristic of malicious activity.

Key Features

Graph-Based Detection: Each application is converted into a function call graph, allowing the GNN to analyze its structure and control flow to identify malicious patterns.

Obfuscation Resilience: By focusing on core API call patterns rather than specific code signatures, the system is highly robust against evasive malware attacks and common obfuscation techniques.

Model Explainability: The project implements edge pruning on the call graph to identify and rank the most critical API calls that contribute to a malware classification. This provides valuable insights into the model's decision-making process, making it more transparent and trustworthy.

Cross-Platform: The methodology is applicable to both Android (APKs) and Windows Portable Executable (PE) files.

Technologies Used

Core Framework: PyTorch

Model Architecture: Graph Neural Networks (GNNs)

Embeddings: Skip-gram for learning representations of API calls.

How It Works

Graph Extraction: The first step is to statically analyze the executable (APK or PE file) and extract its function call graph. Nodes in the graph represent functions, and edges represent calls between them.

Graph Representation: The extracted graph is then processed and converted into a format suitable for the GNN.

Model Training: The GNN model is trained on a labeled dataset of benign and malicious software samples to learn the patterns associated with malware.

Classification and Explanation: Once trained, the model can classify new, unseen software and use techniques like edge pruning to explain its predictions.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages