Skip to content

thechamppc/Detection-Of-Phishing-Websites-From-Urls-Using-IBM-Watson-Studio

Repository files navigation

Detection of Phishing Websites from URLs using IBM Watson Studio

Introduction

This Project mainly focuses on applying a machine-learning algorithm to detect Phishing websites. In order to detect and predict phishing websites (especially for e-banking purposes), we have proposed an intelligent, flexible and effective system that is based on using classification algorithms. We implemented classification algorithms and techniques to extract the phishing datasets' criteria to classify their legitimacy. The phishing websites can be detected based upon some important characteristics like URL and domain identity. When a user wants to make a payment through an e-banking website our system will use a data mining algorithm to detect whether the e-banking website is a phishing website or not.

Objective

The purpose of the project is to detect fake or phishing websites that are trying to get access to sensitive data such as users’ personal credentials. We implemented classification algorithms and techniques to extract the phishing datasets' criteria to classify their legitimacy. We are basically trying to safeguard sensitive data by detecting which websites are trying to gain unauthorized access, with the help of machine learning techniques.

Proposed Solution

We have proposed a solution that uses ML techniques to detect phishing websites from their URLs. We have created a web based Phishing detection interface with the help of Flask framework. We trained a Logistic regression model from a predefined data set on numerous set of features and deployed it using IBM Watson Studio in order to accurately determine potential phishing websites.

We achieved an accuracy rate of 91.6% for our ML model.

Hardware Requirements :

-A Generic PC with at least 8 GB ram
-A Tensorprocessing Unit for faster calculations and accurate model training
-A good Processor to handle all the operations

Software Requirements :

-A Windows / Linux / Mac Operating System
-Anaconda Installed including with all the Dependencies
-Spyder,Jupyter notebook and Flask for the entire application development

Deployment Requirements :

-IBM Cloud
-IBM Watson Studio

Advantages and Disadvantages

Advantages

-Easily identifies trends and patterns in phishing sites
-Continuous Development
-Completely Automated
-Quickly Determines Malicious Sites
-Can be Integrated anywhere with online deployment
-Efficient'
-Online payment processes become secure

Disadvantages

-Data Acquisition
-Takes a lot of time to train models
-Efficiency is dependent upon the data acquired
-Model checks the URLs for only a fixed set of features. With technological advances, phishing URLs are also changing appearances.
-Consumes a lot of CPU, GPU and TPU power to train models

Applications

It has a wide range of applications. Some of which are -

-It can be used as a generic checker if someone is not sure about a website while entering personal information.
-It can converted into an extension which can then be used by web browsers to quickly detect and alert the user in case of a phishing website.
-Can be integrated with antiviruses and firewalls to avoid and block such websites from accessing information.
-The solution can be applied in the e-banking and payment sector.
-The solution can detect fraud URLs imitating official government websites.
-The solution can also be applied to create a more extensive database for identifying phishing URLs.

Future scope

The current model checks the URLs for only a few set of features. With continued effort, we can train the model for more features. The model itself is not perfectly accurate. By increasing its accuracy we can further improve it.

About

The project was developed under the externship program of SmartInternz

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published