Skip to content

Backend

Ramkumar edited this page Sep 16, 2018 · 2 revisions

Dataset Preprocessing

This a public phishing site dataset taken from UCI repository.

Download the dataset and save as dataset.arff. The preprocess.py loads the arff file and converts it to numpy array. Then dataset metadata is printed and then dataset is splited into training and testing set with 30% for testing.
Change working directory to /backend/dataset and Run the preprocessor with

python3 preprocess.py

Training and testing data *.npy files are created in the working directory.

Training RandomForestClassifier

The RandomForestClassifier (ensemble learner) is fitted with the training set and then the accuracy and cross validation scores are printed.
The parameters of the learned model, such as number of estimators, tree parameters such as thresholds for each estimators are dumped on to a file named classifier.json.

Change working directory to /backend/classifier and Run

python3 training.py

classifier.py is created in the working directory.
Serve this classifier.py over HTTP and update URL in the plugin settings.

Clone this wiki locally