"What if we could predict where and when crimes are most likely to occur—before they happen?"
This project explores that very question using Machine Learning on spatial and temporal crime data. Built from scratch and deployed as a fully interactive Streamlit app, Crime Forecasting AI helps visualize crime patterns and predict weekly incident counts by area using data from Los Angeles (2020–Present).
Whether you're a policymaker, data scientist, or civic technologist, this platform demonstrates how AI can support smarter urban safety strategies.
- Goal: Predict weekly crime counts by area using machine learning.
- Data: Public dataset from Kaggle: Crime Data from 2020 to Present, cleaned and engineered into weekly area-based crime trends.
- Models Used:
XGBoost Regressor
Random Forest Regressor
K-Nearest Neighbors (KNN)
📍 Features include:
AREA
,LAT
,LON
,iso_year
,iso_week
🎯 Target: crime_count
— number of incidents reported that week in a given area.
The ML workflow (in crime_detection_ml.py
) includes:
- Preprocessing: Feature engineering, weekly aggregation, area mapping
- Model training: XGBoost, Random Forest, and KNN
- Evaluation using:
- MAE (Mean Absolute Error)
- RMSE (Root Mean Squared Error)
- R² Score
- Accuracy within ±20 crimes
📌 Why ±20? Because in real-world scenarios, predictions don't need to be exact—they need to be actionable.
Model | Accuracy (±20) | MAE | RMSE | R² |
---|---|---|---|---|
XGBoost | 71.79% | 16.26 | 24.00 | 0.854 |
Random Forest | 69.46% | 16.82 | 24.58 | 0.847 |
KNN | 45.60% | 29.45 | 40.85 | 0.578 |
✅ XGBoost performed best, thanks to its ability to capture complex spatial-temporal patterns and its built-in regularization.
Explore the crime predictions visually with the Streamlit dashboard. Built using streamlit
, the app features:
- Area & date selector
- Weekly prediction with model comparison
- Model performance radar chart
- 📍 Crime heatmaps
- 📈 Area-specific weekly crime trends
📷 Click to expand
- Model Accuracy Radar Chart
- Weekly Crime Trend Line Chart
- Crime Density Heatmap with PyDeck
Tip
Or run it on your machine using the steps below.
git clone https://github.com/Sam-Gunawan/Crime-Forecasting-AI
cd Crime-Forecasting-AI
python -m venv venv
source venv/bin/activate # or venv\Scripts\activate on Window
pip install -r requirements.txt
streamlit run app.py
.
├── app.py # Streamlit frontend
├── crime_detection_ml.py # Model training & evaluation script
├── models/
│ ├── crime_model_results.pkl
│ ├── xgboost_model.pkl
│ ├── random_forest_model.pkl
│ └── knn_model.pkl
├── dataset/
│ ├── area_reference.csv
│ ├── grouped_week.csv
│ └── crime_density_by_area.csv
├── AcademicPaper.docx # Full academic research paper
├── requirements.txt
└── README.md
Name | Role |
---|---|
Samuel Gunawan | Project Manager, Full-stack dev |
Nisrina Rahma Syaifullah | Model Architect & Data Engineer |
Calvin Willyanto | Insight Analyst & Report Writer |
Michael Liem | Insight Analyst & Report Writer |
Charles Agustin | Model Architect & Data Engineer |
Some improvements to be made:
- 📦 Deploy as Web App (with Firebase or Heroku backend)
- 🧬 Implement Spatio-Temporal Graph Neural Networks
- 🧠 Hyperparameter tuning with Optuna
- 🌍 Incorporate live data via LAPD/NYPD APIs
- 🎯 Cluster analysis to identify high-risk crime zones
- 🧾 Exportable city-specific crime reports for policy planning
📄 Read our full research study:
“Comparative Analysis of Machine Learning Models for Crime Prediction Based on Spatial and Temporal Features”
Available in AcademicPaper.pdf
This study includes:
- Literature review of AI in criminology
- Detailed methodology
- Evaluation rationale
- Policy implications
- References from 25+ peer-reviewed sources
“AI alone doesn’t stop crime—but it can help us get ahead of it.”
This project is our step toward safer, smarter cities.