This project performs exploratory data analysis (EDA) and hypothesis testing using datasets related to taxi trips in Chicago. The main focus includes identifying demand hotspots, understanding taxi company performance, and evaluating the impact of rainy weather on trip durations from downtown Chicago (Loop) to OβHare International Airport.
- Analyze taxi companies based on trip volumes on November 15β16, 2017.
- Identify the top 10 drop-off areas in Chicago by average number of trips during November 2017.
- Visualize distribution of trips per taxi company and per drop-off area.
- Null Hypothesis (Hβ): The average trip duration from the Loop to OβHare International Airport does not change on rainy Saturdays.
- Alternative Hypothesis (Hβ): The average trip duration from the Loop to OβHare International Airport changes on rainy Saturdays.
- Method: Independent t-test with a significance level (Ξ±) of 0.05.
All datasets used in this project are stored in the data/
directory:
data/chicago_taxi_data_01.csv
: Taxi company data for Nov 15β16, 2017.data/chicago_taxi_data_02.csv
: Average drop-off volume per area.data/chicago_taxi_data_03.csv
: Trip records from Loop to OβHare with weather data.
- Flash Cab had the highest number of trips on Nov 15β16, indicating strong demand or superior service.
- Loop, River North, and Streeterville were the top destinations, reflecting their high popularity.
- Hypothesis testing revealed a statistically significant increase in trip duration on rainy Saturdays, supporting the alternative hypothesis.
- Python
- Pandas, NumPy
- Matplotlib, Seaborn
- SciPy (for statistical testing)
- Jupyter Notebook
Nabilla Hafsah Caesaredia
- Clone this repository:
git clone https://github.com/yourusername/chicago-taxi-analysis.git