Evaluation and Comparison of Boosted ML Models in Behavior-Based Malware Detection

Choa, de Veyra, Escalona, Fortiz

This is a repository for the Thesis "Evaluation and Comparison of Boosted ML Models in Behavior-Based Malware Detection".

It contains the Jupyter notebook files and datasets used for the development of the study.

Tested on:

Windows (Recommended)
Linux (Debian-based)

Pre-requisite Software:

Kindly install these before proceeding to the next step.

Dependency/Library Installation:

Install Python and Anaconda/Conda accordingly.
Once the two are installed, open Anaconda Prompt AS ADMINISTRATOR in your computer and navigate to the local copy of the repository in your computer.
- Make sure to install graphiz to allow for tree visualization in CatBoost.
- Make sure to follow the instructions shown in .\Graphiz\README.md regarding the installation of graphiz.
Once navigated, type install.bat for Windows or install.sh for Linux. The script will begin the installation of the necessary dependencies/libraries for your Conda environment.
Once completed, you can now begin exploring the thesis project files.

How to run:

Open Anaconda Prompt
Navigate to the location of the GitHub repository on your computer.
Type jupyter notebook
To terminate jupyter, simply Ctrl+C on the Anaconda Prompt.

Tips for running on Linux:

Install Anaconda as shown here.
Once completed, run Anaconda Terminal (assuming conda config --set auto_activate_base False ) by typing soure <PATH_TO_ANACONDA>/bin/activate

CUDA Toolkit

Make sure you have installed the CUDA Toolkit in your machine to ensure that GPU (CUDA-specific) is supported. Do note that this may replace (downgrade) your GPU driver.

LightGBM GPU Support

Download the latest GCC
Download the latest CMake
Download the Boost v1.56.0
Follow the guide accordingly

Note that the installation of LightGBM with CUDA support has a steep learning curve.

Disclaimer

Due to the non-deterministic and entropic nature of the models and functions used in this study, it is not expected that the actual results are not guaranteed to be 1:1 to the results obtained by the study. However, the overall trends shall remain the same. In addition, the proponents of this experiment and study have done its due diligence to make sure that the results will be as consistent as possible by utilizing a consistent seed value on all notebooks to make the results as predictable as possible from each run of the notebooks.

Name		Name	Last commit message	Last commit date
Latest commit History 337 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Datasets		Datasets
Graphiz		Graphiz
Official Development		Official Development
Others		Others
Results		Results
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
install.bat		install.bat
install.sh		install.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Evaluation and Comparison of Boosted ML Models in Behavior-Based Malware Detection

Tested on:

Pre-requisite Software:

Dependency/Library Installation:

How to run:

Tips for running on Linux:

CUDA Toolkit

LightGBM GPU Support

Disclaimer

About

Uh oh!

Uh oh!

Contributors 3

Uh oh!

Languages

jm55/Evaluation-and-Comparison-of-Boosted-ML-Models-in-Behavior-Based-Malware-Detection

Folders and files

Latest commit

History

Repository files navigation

Evaluation and Comparison of Boosted ML Models in Behavior-Based Malware Detection

Tested on:

Pre-requisite Software:

Dependency/Library Installation:

How to run:

Tips for running on Linux:

CUDA Toolkit

LightGBM GPU Support

Disclaimer

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 3

Uh oh!

Languages