URL Classifier API

A REST API that uses machine learning to classify URLs as benign or malicious. The API analyzes 38 features extracted from URLs to predict their safety level and provides risk assessment.

Features

URL Classification: Classifies URLs as benign or malicious using a trained machine learning model
Risk Assessment: Provides risk levels (low, medium, high) based on malicious probability
Feature Extraction: Extracts 38 lexical and host-based features from URLs
Flexible Input: Supports single URL or multiple URLs in one request
RESTful API: Easy-to-use REST endpoints with JSON responses

Installation

Prerequisites

Python 3.13 or higher
pip package manager

Setup

Clone the repository

git clone https://github.com/Lee-yah/URL-Classifier-API.git
cd URL-Classifier-API

Install dependencies
```
pip install -r requirements.txt
```
Run the application
```
python app.py
```

The API will be available at http://localhost:5000

API Usage

Base URL

Production API: https://url-classifier-api.onrender.com
Local Development: http://localhost:5000

Endpoint

POST/GET /predict/

Request Methods

Method 1: GET Request (Single URL)

Postman Setup:

Method: GET
URL: http://localhost:5000/predict/?url=https://example.com

Method 2: POST Request (Single URL - JSON)

Postman Setup:

Method: POST
URL: http://localhost:5000/predict/
Headers: Content-Type: application/json
Body (raw JSON):

{
  "url": "https://example.com"
}

Method 3: POST Request (Multiple URLs - JSON)

Postman Setup:

Method: POST
URL: http://localhost:5000/predict/
Headers: Content-Type: application/json
Body (raw JSON):

{
  "urls": [
    "https://example.com",
    "https://test.com", 
    "http://suspicious-site.com"
  ]
}

Method 4: POST Request (Form Data)

Postman Setup:

Method: POST
URL: http://localhost:5000/predict/
Body: x-www-form-urlencoded
Key-Value:
- Key: url
- Value: https://example.com

Response Format

Successful Response

[
  {
    "url": "https://example.com",
    "prediction": 0,
    "malicious_probability": 0.1234,
    "prediction_label": "benign",
    "risk_level": "low"
  },
  {
    "url": "http://suspicious-site.com",
    "prediction": 1,
    "malicious_probability": 0.8765,
    "prediction_label": "malicious",
    "risk_level": "high"
  }
]

Error Response

{
  "error": "Cannot extract data. Use correct syntax or keyword"
}

Response Fields

Field	Type	Description
`url`	string	The analyzed URL
`prediction`	integer	Binary prediction (0 = benign, 1 = malicious)
`malicious_probability`	float	Probability of the URL being malicious (0.0-1.0)
`prediction_label`	string	Human-readable prediction ("benign" or "malicious")
`risk_level`	string	Risk assessment ("low", "medium", "high")

Risk Level Classification

Low Risk: malicious_probability < 0.5
Medium Risk: 0.5 ≤ malicious_probability < 0.85
High Risk: malicious_probability ≥ 0.85

Model Information

The API uses a trained machine learning model that analyzes 38 features extracted from URLs.

Feature Categories:

36 Lexical Features: URL structure, character analysis, and content patterns
2 Host-based Features: Domain registration and expiration information

Performance:

Current Model: 3rd generation with 98.83% training accuracy
Test Performance: 98.9% malicious URL detection rate
Algorithm: XGBoost Classifier

For detailed model performance metrics, dataset information, and training history, see MODEL.md. For detailed documentation of all features, see FEATURES.md.

Development

Environment Configuration

Local Development

The application runs in debug mode by default for easy testing and development.

Production Deployment

The DEVELOPMENT_ENVIRONMENT=production environment variable is required to disable debug mode in production.

Note: The deployed API on Render is configured for automatic deployment. Any changes pushed to the branch connected to render will automatically trigger a rebuild and deployment.

Dependencies

Key dependencies include:

Flask: Web framework
pandas: Data manipulation
scikit-learn: Machine learning model loading
python-whois: Domain information retrieval

See requirements.txt for the complete list.

Support

For issues and questions, please open an issue on the GitHub repository.

Note: This API is designed for educational and research purposes. For production use, consider implementing additional security measures, rate limiting, and input validation.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Paper Reference		Paper Reference
__pycache__		__pycache__
images		images
templates		templates
utils		utils
FEATURES.md		FEATURES.md
MODEL.md		MODEL.md
Notes.md.txt		Notes.md.txt
README.md		README.md
URL_classification_model_BENIGN_MALICIOUS_3000k-Model_1-38_features.json		URL_classification_model_BENIGN_MALICIOUS_3000k-Model_1-38_features.json
URL_classification_model_BENIGN_MALICIOUS_3000k-Model_2-38_features-2nd_model.json		URL_classification_model_BENIGN_MALICIOUS_3000k-Model_2-38_features-2nd_model.json
URL_classification_model_BENIGN_MALICIOUS_6000k-38_features-3rd_model.json		URL_classification_model_BENIGN_MALICIOUS_6000k-38_features-3rd_model.json
app.py		app.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
script.py		script.py
url_classifier.py		url_classifier.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

URL Classifier API

Features

Installation

Prerequisites

Setup

API Usage

Base URL

Endpoint

Request Methods

Method 1: GET Request (Single URL)

Method 2: POST Request (Single URL - JSON)

Method 3: POST Request (Multiple URLs - JSON)

Method 4: POST Request (Form Data)

Response Format

Successful Response

Error Response

Response Fields

Risk Level Classification

Model Information

Feature Categories:

Performance:

Development

Environment Configuration

Local Development

Production Deployment

Dependencies

Support

About

Uh oh!

Releases

Packages

Languages

Lee-yah/URL-Classifier-API

Folders and files

Latest commit

History

Repository files navigation

URL Classifier API

Features

Installation

Prerequisites

Setup

API Usage

Base URL

Endpoint

Request Methods

Method 1: GET Request (Single URL)

Method 2: POST Request (Single URL - JSON)

Method 3: POST Request (Multiple URLs - JSON)

Method 4: POST Request (Form Data)

Response Format

Successful Response

Error Response

Response Fields

Risk Level Classification

Model Information

Feature Categories:

Performance:

Development

Environment Configuration

Local Development

Production Deployment

Dependencies

Support

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages