A REST API that uses machine learning to classify URLs as benign or malicious. The API analyzes 38 features extracted from URLs to predict their safety level and provides risk assessment.
- URL Classification: Classifies URLs as benign or malicious using a trained machine learning model
- Risk Assessment: Provides risk levels (low, medium, high) based on malicious probability
- Feature Extraction: Extracts 38 lexical and host-based features from URLs
- Flexible Input: Supports single URL or multiple URLs in one request
- RESTful API: Easy-to-use REST endpoints with JSON responses
- Python 3.13 or higher
- pip package manager
-
Clone the repository
git clone https://github.com/Lee-yah/URL-Classifier-API.git cd URL-Classifier-API
-
Install dependencies
pip install -r requirements.txt
-
Run the application
python app.py
The API will be available at http://localhost:5000
- Production API:
https://url-classifier-api.onrender.com
- Local Development:
http://localhost:5000
POST/GET /predict/
Postman Setup:
- Method:
GET
- URL:
http://localhost:5000/predict/?url=https://example.com
Postman Setup:
- Method:
POST
- URL:
http://localhost:5000/predict/
- Headers:
Content-Type: application/json
- Body (raw JSON):
{
"url": "https://example.com"
}
Postman Setup:
- Method:
POST
- URL:
http://localhost:5000/predict/
- Headers:
Content-Type: application/json
- Body (raw JSON):
{
"urls": [
"https://example.com",
"https://test.com",
"http://suspicious-site.com"
]
}
Postman Setup:
- Method:
POST
- URL:
http://localhost:5000/predict/
- Body:
x-www-form-urlencoded
- Key-Value:
- Key:
url
- Value:
https://example.com
- Key:
[
{
"url": "https://example.com",
"prediction": 0,
"malicious_probability": 0.1234,
"prediction_label": "benign",
"risk_level": "low"
},
{
"url": "http://suspicious-site.com",
"prediction": 1,
"malicious_probability": 0.8765,
"prediction_label": "malicious",
"risk_level": "high"
}
]
{
"error": "Cannot extract data. Use correct syntax or keyword"
}
Field | Type | Description |
---|---|---|
url |
string | The analyzed URL |
prediction |
integer | Binary prediction (0 = benign, 1 = malicious) |
malicious_probability |
float | Probability of the URL being malicious (0.0-1.0) |
prediction_label |
string | Human-readable prediction ("benign" or "malicious") |
risk_level |
string | Risk assessment ("low", "medium", "high") |
- Low Risk: malicious_probability < 0.5
- Medium Risk: 0.5 ≤ malicious_probability < 0.85
- High Risk: malicious_probability ≥ 0.85
The API uses a trained machine learning model that analyzes 38 features extracted from URLs.
- 36 Lexical Features: URL structure, character analysis, and content patterns
- 2 Host-based Features: Domain registration and expiration information
- Current Model: 3rd generation with 98.83% training accuracy
- Test Performance: 98.9% malicious URL detection rate
- Algorithm: XGBoost Classifier
For detailed model performance metrics, dataset information, and training history, see MODEL.md. For detailed documentation of all features, see FEATURES.md.
The application runs in debug mode by default for easy testing and development.
The DEVELOPMENT_ENVIRONMENT=production
environment variable is required to disable debug mode in production.
Note: The deployed API on Render is configured for automatic deployment. Any changes pushed to the branch connected to render will automatically trigger a rebuild and deployment.
Key dependencies include:
- Flask: Web framework
- pandas: Data manipulation
- scikit-learn: Machine learning model loading
- python-whois: Domain information retrieval
See requirements.txt
for the complete list.
For issues and questions, please open an issue on the GitHub repository.
Note: This API is designed for educational and research purposes. For production use, consider implementing additional security measures, rate limiting, and input validation.