Skip to content

A machine learning model using a Random Forest Classifier to classify URLs as malicious or benign. It analyzes structural URL features and achieves a 95% F1 score.

Notifications You must be signed in to change notification settings

Tarunrao0/malicious_link_detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

malicious banner

Malicious Link Detection

The proposed machine learning model uses a Random Forest classifier to classify URLs as malicious or benign. The model focuses on analyzing a variety of features derived from the URL itself, such as structural attributes (e.g., URL length, number of special characters, or the presence of IP addresses) and lexical patterns.

By leveraging these features, the Random Forest algorithm builds an ensemble of decision trees to make accurate and reliable predictions, helping to identify potentially harmful links efficiently

Kaggle Dataset

Malicious links

A dataset containing 651,191 URLs, out of which 428103 benign or safe URLs, 96457 defacement URLs, 94111 phishing URLs, and 32520 malware URLs

Approach

  • Combined defacement, phishing and malware classes to a single class malicious
  • Wrote several functions to extract features from just the URLs
  • Used sklearn's BaseEstimators and TransformerMixin to construct pipelines
  • After preprocessing, used RandomForestClassifier to detect the malicious URLs

Challenges

  • I encountered a significant class imbalance due to the limited number of defacement, phishing, and malware URLs. Adjusting class weights alone did not yield satisfactory results. As a solution, I consolidated all these categories into a single "malicious" category.
  • Neural Networks, even after extensive hyperparameter tuning, achieved a maximum accuracy of 81%. To improve performance, I transitioned to a Random Forest Classifier, which significantly outperformed with an accuracy of 95%.

Performance

  • F-1 Score : 95%

Links 🖇️:

Kaggle Dataset : Malicious links
Model : Malicious Link Detection Model

About

A machine learning model using a Random Forest Classifier to classify URLs as malicious or benign. It analyzes structural URL features and achieves a 95% F1 score.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published