Malicious Link Detection

The proposed machine learning model uses a Random Forest classifier to classify URLs as malicious or benign. The model focuses on analyzing a variety of features derived from the URL itself, such as structural attributes (e.g., URL length, number of special characters, or the presence of IP addresses) and lexical patterns.

By leveraging these features, the Random Forest algorithm builds an ensemble of decision trees to make accurate and reliable predictions, helping to identify potentially harmful links efficiently

Kaggle Dataset

Malicious links

A dataset containing 651,191 URLs, out of which 428103 benign or safe URLs, 96457 defacement URLs, 94111 phishing URLs, and 32520 malware URLs

Approach

Combined defacement, phishing and malware classes to a single class malicious
Wrote several functions to extract features from just the URLs
Used sklearn's BaseEstimators and TransformerMixin to construct pipelines
After preprocessing, used RandomForestClassifier to detect the malicious URLs

Challenges

I encountered a significant class imbalance due to the limited number of defacement, phishing, and malware URLs. Adjusting class weights alone did not yield satisfactory results. As a solution, I consolidated all these categories into a single "malicious" category.
Neural Networks, even after extensive hyperparameter tuning, achieved a maximum accuracy of 81%. To improve performance, I transitioned to a Random Forest Classifier, which significantly outperformed with an accuracy of 95%.

Performance

F-1 Score : 95%

Links 🖇️:

Kaggle Dataset : Malicious links
Model : Malicious Link Detection Model

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
malicious_link_detection.ipynb		malicious_link_detection.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Malicious Link Detection

Kaggle Dataset

Approach

Challenges

Performance

Links 🖇️:

About

Uh oh!

Releases

Packages

Languages

Tarunrao0/malicious_link_detection

Folders and files

Latest commit

History

Repository files navigation

Malicious Link Detection

Kaggle Dataset

Approach

Challenges

Performance

Links 🖇️:

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages