Skip to content

Radowan98/Vulnerability-Dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 

Repository files navigation

Vulnerability Dataset

This repository contains a dataset for vulnerability detection used in our research study:
A Zero-Shot Framework for Cross-Project Vulnerability Detection in Source Code.

πŸ“‚ Dataset Overview

The dataset is derived from publicly available datasets used in prior research on vulnerability detection:

To facilitate ease of use, we have preprocessed and reformatted the datasets into a single combined.pkl file containing labeled source code functions.

πŸ”Ή File

File Name Description
combined.pkl A dictionary containing vulnerability-labeled functions from multiple projects (FFmpeg, Chrome, Debian, Qemu).

πŸ”Ή Structure

combined.pkl contains a dictionary where:

  • combined_data["FFmpeg"]
  • combined_data["Chrome"]
  • combined_data["Debian"]
  • combined_data["Qemu"]

Each dataset consists of source code functions and their binary labels indicating whether they are vulnerable (1) or non-vulnerable (0).

πŸ“– Usage

To load the dataset in Python:

import pickle

with open("combined.pkl", "rb") as f:
    combined_data = pickle.load(f)

# Access datasets
ffmpeg_data = combined_data["FFmpeg"]
chrome_data = combined_data["Chrome"]
debian_data = combined_data["Debian"]
qemu_data = combined_data["Qemu"]

print(ffmpeg_data)

About

Dataset for vulnerability detection used in my research study.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published