Skip to content

Commit 4393fe1

Browse files
authored
Merge pull request #1 from m-zakeri/r0.3.0
R0.3.0
2 parents 0f9d112 + 911af4a commit 4393fe1

File tree

90 files changed

+2474
-15
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

90 files changed

+2474
-15
lines changed

README.md

Lines changed: 32 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,38 @@ Fuzz testing (Fuzzing) is a dynamic software testing technique. In this techniqu
1111
In this thesis, we proposed an automated method for hybrid test data generation. To this aim, we apply neural language models (NLMs) that are constructed by recurrent neural networks (RNNs). The proposed models by using deep learning techniques can learn the statistical structure of complex files and then generate new textual test data, based on the grammar, and binary data, based on mutations. Fuzzing the generated data is done by two newly introduced algorithms, called neural fuzz algorithms that use these models. We use our proposed method to generate test data, and then fuzz testing of MuPDF complicated software which takes portable document format (PDF) files as input. To train our generative models, we gathered a large corpus of PDF files. Our experiments demonstrate that the data generated by this method leads to an increase in the code coverage, more than 7%, compared to state-of-the-art file format fuzzers such as American fuzzy lop (AFL). Experiments also indicate a better learning accuracy of simpler NLMS in comparison with the more complicated encoder-decoder model and confirm that our proposed models can outperform the encoder-decoder model in code coverage when fuzzing the SUT.
1212

1313

14+
## Getting Started
15+
In the current release (0.3.0) you can use IUST-DeepFuzz for test data generation and then fuzzing every application.
16+
17+
### Install
18+
You need to have Python 3.6.x and and up-to-date TensorFlow and Keras frameworks on your computer.
19+
* Install [Python 3.6.x](https://www.python.org/)
20+
* Install [TensorFlow](https://www.tensorflow.org/)
21+
* Install [Keras](https://keras.io/)
22+
* Clone the IUST-DeepFuzz repository: `git clone https://github.com/m-zakeri/iust_deep_fuzz.git` or download the latest version https://github.com/m-zakeri/iust_deep_fuzz.git
23+
* IUST-DeepFuzz is almost ready for test data generation!
24+
25+
### Running
26+
* Configure the `config.py` work with your dataset and to set other paths settings.
27+
* Find the script of specific algorithm that you need.
28+
* Run the script in command line: `python script_name.py`
29+
* Wait until your file format learn and your test data is generate!
30+
31+
#### Available Pre-trained Models
32+
A pre-trained model is a model that was trained on a large benchmark dataset to solve a problem similar to the one that we want to solve. For the time being, we provided some pre-trained model for PDF file format. Our best trained model is available at [model_checkpoint/best_models](model_checkpoint/best_models)
33+
34+
#### Availbale Fuzzing Scripts
35+
ISUT-DeepFuzz has implemented four new deep models and two new fuzz algorithms: DataNeuralFuzz and MetadataNeuralFuzz as our contribution in mentioned thesis. The following algorithms to generate and fuzz test data are available in the current release (r0.3.0):
36+
* `data_neural_fuzz.py`: To implement the DataNeuralFuzz algorithm for fuzzing data in the files.
37+
* `metadata_neural_fuzz.py`: To implement MetadataNeuralFuzz for fuzzing metadata in the files.
38+
* `learn_and_fuzz_3_sample_fuzz.py`: To implement SampleFuzz algorithm introduced in https://arxiv.org/abs/1701.07232.
39+
40+
#### Available Dataset
41+
Various file format for learning with IUST-DeepFuzz and then fuzz testing is available at [dataset directory](dataset).
42+
43+
44+
## How It Works?
45+
1446
### The PDF File Generation Process
1547
![amazing_test_data_generation_process](docs/figs/amazing_test_data_generation_process.gif)
1648

@@ -20,13 +52,6 @@ In this thesis, we proposed an automated method for hybrid test data generation.
2052

2153

2254

23-
## About
24-
### Version 0.1
25-
The main purpose of this version is to implement a free version of learn and fuzz paper and improve the **learn\&fuzz algorithm**.
26-
27-
### Version 0.2
28-
This version implements four new deep models and two new fuzz algorithms: DataNeuralFuzz and MetadataNeuralFuzz as our contribution in mentioned thesis.
29-
3055
### FAQs
3156
This repository is under *active development* and it dose not documented well. If you have downloaded source code or have forked it and have any questions, then feel free to email me (*m-zakeri@live.com*) and get more information. You may see the main [references](REFERENCES.md) or look at our large [test corpus](dataset).
3257

dataset/README.md

Lines changed: 11 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,27 @@
1-
# IUST Neural Software Testing (NST) Dataset
1+
# IUST Neural Software Testing (IUST-NST) Dataset
22

33
Neural software testing (NST) is about applying machine learning techniques, specially deep-learning and neural network, in the field of software testing. We began with fuzz testing, but it can transform into other types of software testing. An unavoidable part of all machine learning task is data. The goal of this section is to provide suitable and public dataset which can be used by other researchers.
44

5-
65
For now, we are gathering some large corpus for different file formats such as portable document format (PDF), extensible markup language (XML), and hypertext markup language (HTML) to do fuzz testing real-world application which takes these formats as their majoring inputs.
76
At this time, IUST PDF Corpus is ready to view and download.
87

9-
## IUST PDF Corpus
108

11-
![IUSTPDFCorpusDemo Image](pdfs/IUSTPDFCorpusDemo.PNG)
9+
### News
10+
**2019-10-13:** IUST-PDFCorpus version 1.0.0 is publicly available at [https://zenodo.org/record/3484013](https://zenodo.org/record/3484013) with DOI **10.5281/zenodo.3484013**.
11+
12+
13+
## IUST-PDFCorpus
14+
**Download:** [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3484013.svg)](https://doi.org/10.5281/zenodo.3484013)
15+
16+
![IUSTPDFCorpusDemo Image](pdfs/IUST-PDFCorpusDemo.PNG)
1217

1318
We are happy to introduce **IUST PDF Corpus**, a large set of various PDF files, aimed at manipulating,
1419
testing and improving the qualification of real-world PDF readers such as [MuPDF](https://mupdf.com/).
15-
IUST PDF Corpus (version 1.0) contains about **6,000** PDF file. we extract more than **500,000** PDF data object from this corpus to evaluate IUST DeepFuzz, our new file format fuzzer.
20+
IUST PDF Corpus (version 1.0) contains **6,141** PDF file. we extract more than **500,000** PDF data object from this corpus to evaluate IUST DeepFuzz, our new file format fuzzer.
1621

1722
The extracted objects have put under a _pdfs_ directory. We divide the objects dataset into two sub-dataset: _large-size_ and _small-size_. The small-size dataset is created to develop and test the generative models and has about 120,000 PDF objects. The large dataset is used to train deep models and fuzz testing PDF viewers and has 500,000 PDF objects.
1823
We are extending this corpus and want to add more PDF files, as soon as possible.
19-
We also extract 1000 binary streams form data objects. These streams have put under the small-size subdirectory. All extracted objects are available to [view and download](./pdfs/) from the current GitHub repository. The complete set of PDF files will be available to view and download as soon as our relevant paper on IUST DeepFuzz is published.
20-
21-
* [View and download IUST PDF Corpus (version 1.0)](https://www.dropbox.com/sh/0gr8qscxdoawwtw/AAD_0Za_bFbrfCoSBTzoeE1Oa?dl=0) [Not available yet!]
24+
We also extract 1000 binary streams form data objects. These streams have put under the small-size subdirectory. All extracted objects are available to [view and download](./pdfs/) from the current GitHub repository. The complete set of PDF files will be available to view and download as soon as our relevant paper on IUST DeepFuzz is published.
2225

2326

2427
## IUST XML Corpus

0 commit comments

Comments
 (0)