This repository contains an enhanced analysis of the relationship between police deployment and crime rates in London, incorporating both econometric methods and machine learning models. The study builds upon the paper "Panic on the Streets of London: Police, Crime, and the July 2005 Terror Attacks" by Draca, Machin, and Witt (AER, 2011).
This analysis investigates the impact of police deployment on crime rates using weekly panel data from 32 boroughs in London over two years. The dataset includes variables such as crime, police hours, population, unemployment rate, labor force participation, demographic breakdowns, and more. The study employs econometric methods alongside machine learning models to estimate the causal effect of police presence on crime rates and to predict crime trends.
- Objective: Explore the relationship between crime rate and police rate, controlling for unemployment, labor force participation, young male percentage, and white percentage.
- Transformation: Natural log transformation of variables to handle non-linear relationships.
- Method: Ordinary Least Squares (OLS) regression.
- Objective: Address unobserved heterogeneity by differencing data across the same weeks in two consecutive years.
- Approach: Incorporate week fixed effects to control for systematic temporal variation in crime trends.
- Scenario: A temporary six-week surge in police deployment in five boroughs following the 2005 London terrorist attacks.
- Key Variables:
sixweeks
: Indicates weeks 80–85 (the intervention period).sixweeks_treat
: Highlights the treated boroughs during the intervention period.
- Objective: Estimate the causal impact of an exogenous increase in police presence on crime rates.
- Linear Regression: Predicts log crime rates using features like police rate, unemployment rate, and demographic factors.
- Random Forest Regressor: A robust ensemble method for predicting log crime rates with non-linear interactions.
- Evaluation Metrics: Mean Squared Error (MSE) and R-squared (R²) are used to compare model performance.
- Source: Provided dataset
london_crime.csv
contains weekly observations across 32 boroughs over two years (3328 total observations). - Key Variables:
crime
: Number of crimes in a borough in a given week.police
: Police hours worked in a borough in a given week.population
: Borough population (constant over time).emp
: Labor force participation rate.un
: Unemployment rate.ymale
: Percentage of males under 25 years old.white
: Percentage of the white population.week
: Week index (1–104).borough
: Borough identifier (1–32).
- Implements the analysis with four components:
- Log-log regression to estimate initial relationships.
- Differenced regression to handle unobserved heterogeneity.
- Natural experiment regression leveraging the 2005 London attacks as an exogenous shock.
- Machine learning models (Linear Regression and Random Forest) for predictive analysis.
- The dataset used in the analysis, containing crime and socioeconomic data.
- Documentation outlining the objectives, methodologies, and findings.
-
Clone this repository.
git clone <repository-url> cd <repository-name>
-
Ensure Python 3.x and the required libraries (
pandas
,numpy
,statsmodels
,scikit-learn
) are installed. -
Run the analysis:
python crime_ml_analysis.py
The script outputs:
- Regression summaries for:
- Log-log regression.
- Differenced regression.
- Natural experiment regression.
- Machine learning performance metrics (MSE, R²) for Linear Regression and Random Forest models.
- Initial Analysis: Cross-sectional regression indicates a positive correlation between police and crime rates, likely due to reverse causation.
- Panel Data Analysis: Differencing reveals a more nuanced relationship, controlling for unobserved heterogeneity.
- Natural Experiment: Exogenous police increases show a significant reduction in crime rates in treated boroughs, providing causal evidence for police effectiveness.
- Machine Learning: Models effectively predict crime rates, with Random Forest outperforming Linear Regression in capturing non-linear relationships.
- Aditya Rohatgi
This project is licensed under the MIT License.