This Project mainly focuses on applying a machine-learning algorithm to detect Phishing websites. In order to detect and predict phishing websites (especially for e-banking purposes), we have proposed an intelligent, flexible and effective system that is based on using classification algorithms. We implemented classification algorithms and techniques to extract the phishing datasets' criteria to classify their legitimacy. The phishing websites can be detected based upon some important characteristics like URL and domain identity. When a user wants to make a payment through an e-banking website our system will use a data mining algorithm to detect whether the e-banking website is a phishing website or not.
The purpose of the project is to detect fake or phishing websites that are trying to get access to sensitive data such as users’ personal credentials. We implemented classification algorithms and techniques to extract the phishing datasets' criteria to classify their legitimacy. We are basically trying to safeguard sensitive data by detecting which websites are trying to gain unauthorized access, with the help of machine learning techniques.
We have proposed a solution that uses ML techniques to detect phishing websites from their URLs. We have created a web based Phishing detection interface with the help of Flask framework. We trained a Logistic regression model from a predefined data set on numerous set of features and deployed it using IBM Watson Studio in order to accurately determine potential phishing websites.
-A Generic PC with at least 8 GB ram
-A Tensorprocessing Unit for faster calculations and accurate model training
-A good Processor to handle all the operations
-A Windows / Linux / Mac Operating System
-Anaconda Installed including with all the Dependencies
-Spyder,Jupyter notebook and Flask for the entire application development
-IBM Cloud
-IBM Watson Studio
-Easily identifies trends and patterns in phishing sites
-Continuous Development
-Completely Automated
-Quickly Determines Malicious Sites
-Can be Integrated anywhere with online deployment
-Efficient'
-Online payment processes become secure
-Data Acquisition
-Takes a lot of time to train models
-Efficiency is dependent upon the data acquired
-Model checks the URLs for only a fixed set of features. With technological advances, phishing URLs are also changing appearances.
-Consumes a lot of CPU, GPU and TPU power to train models
It has a wide range of applications. Some of which are -
-It can be used as a generic checker if someone is not sure about a website while entering personal information.
-It can converted into an extension which can then be used by web browsers to quickly detect and alert the user in case of a phishing website.
-Can be integrated with antiviruses and firewalls to avoid and block such websites from accessing information.
-The solution can be applied in the e-banking and payment sector.
-The solution can detect fraud URLs imitating official government websites.
-The solution can also be applied to create a more extensive database for identifying phishing URLs.
The current model checks the URLs for only a few set of features. With continued effort, we can train the model for more features. The model itself is not perfectly accurate. By increasing its accuracy we can further improve it.