Skip to content
Mr Philosopher edited this page Aug 23, 2023 · 3 revisions

Austin Bikeshare data analysis

As a part of Stream Processing and Stream Analytics project.

Project contributers

Project Disclaimer Please note that this project is currently under active development. The information provided in this readme and associated documentation is subject to change without prior notice. While we strive to provide accurate and up-to-date details, certain aspects of the project may be modified, added, or removed as development progresses.

Analysis Summary


This project aims to conduct comprehensive data analysis on Austin Bikeshare trips using real-time streaming data. Through advanced data transformations and insights generation, this analysis seeks to uncover valuable patterns and trends that can inform decision-making and operational improvements within the bikeshare system.

Analysis Ideas

  1. Subscriber Type Analysis:

    • Explore the distribution of subscriber types (e.g., Pay-as-you-ride, Local365).
    • Identify trends in usage patterns based on different subscriber categories.
  2. Popular Stations and Routes:

    • Determine the most frequently used start and end stations.
    • Uncover common routes taken by bikeshare users for optimization.
  3. Duration Analysis:

    • Analyze trip durations to understand average length and identify exceptional cases.
    • Explore correlations between trip duration and other factors.
  4. Bike Usage Analysis:

    • Investigate bike usage patterns to identify popular bikes and maintenance trends.
    • Examine the relationship between bike type and usage frequency.
  5. Time-based Patterns:

    • Study usage patterns based on time of day, weekdays vs. weekends, and months.
    • Identify peak usage hours and seasonal trends.
  6. Subscriber Behavior:

    • Analyze how subscriber types influence trip behaviors such as trip duration and station choices.
    • Determine if different subscriber types have distinct usage patterns.
  7. Station Popularity Over Time:

    • Track popularity changes of specific stations over time.
    • Investigate factors influencing shifts in station popularity.
  8. Subscriber Retention Analysis:

    • Explore whether different subscriber types exhibit varying retention rates.
    • Understand which types of users tend to continue using the service over time.
  9. Bike Availability:

    • Investigate bike availability at stations and its impact on user behavior.
    • Analyze how bike availability affects station popularity.
  10. Subscriber Demographics (If Available):

    • Correlate subscriber types with demographic information if available.
    • Understand if certain types of users are more likely to use the service.
  11. Ride Patterns by Day of the Week:

    • Observe how usage patterns change across different days of the week.
    • Determine if certain days have higher usage due to specific events or reasons.

Final Dataset Schema idea

Table: analysis

Fields:
- trip_id: STRING (Primary Key)
- subscriber_type: STRING
- bike_id: STRING
- bike_type: STRING
- start_time: TIMESTAMP
- start_station_id: INTEGER
- start_station_name: STRING
- end_station_id: INTEGER
- end_station_name: STRING
- duration_minutes: INTEGER
- day_of_week: INTEGER
- hour_of_day: INTEGER
- month: INTEGER
- route: STRING
- is_weekend: BOOLEAN
- is_peak_hour: BOOLEAN
- subscriber_demographics: STRING
- subscriber_retention: BOOLEAN
- bike_availability: INTEGER
- station_popularity: INTEGER
- ...

Indexes:
- trip_id (Primary Key) ```