nyc-yellow-trip-data-pipeline

Phase 1

In this phase of the project, I will focus on ingesting 1 million rows of data from the NYC Yellow Taxi dataset provided to you. The goal is to create a robust data pipeline that can efficiently handle and store the dataset for subsequent analysis. You are required to perform the following tasks:

Download and Familiarize:

Download the NYC yellow_tripdata_2016-02 dataset and become familiar with its structure and attributes.
Data Transformation: Perform basic data transformation tasks, such as handling data type conversions, to ensure the ingested data is ready for analysis.
Data Ingestion: Implement a data ingestion process to load 1 million rows of data from the dataset into your postgres database system. Ensure that the ingestion process is efficient and can handle large volumes of data.

Phase 2

In this phase, I will utilize the ingested data to build reporting pipelines that generate valuable insights from the NYC Yellow Taxi dataset. I will create three reporting tables as described below:

operations_and_performance

How many trips were recorded in the dataset?
What is the average trip distance for all trips?
Which Vendor has the highest number of trips?
Which Vendor has the lowest number of trips?
What is the average passenger count per trip?

customer_demographics_and_preferences

What is the average trip amount given by passengers?
What is the average trip distance by passengers?
How many trips were flagged as 'store and forward'?
How many trips were shared rides (passenger count > 1)?

Table: ingestion_date |avg_tip_amount |avg_trip_distance_by_passenger |store_and_forward_trips |shared_ride_count

financial_performance

What is the average fare amount per trip?
How much revenue was generated from tolls and surcharges combined?
What is the average total amount paid by passengers?

Table: ingestion_date | avg_fare_amount | tolls_and_surcharges_revenue | avg_total_amount

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
ETL Project 1.ipynb		ETL Project 1.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

nyc-yellow-trip-data-pipeline

Phase 1

Download and Familiarize:

Phase 2

operations_and_performance

customer_demographics_and_preferences

financial_performance

About

Uh oh!

Releases

Packages

Languages

tadegbayi/nyc-yellow-trip-data-pipeline

Folders and files

Latest commit

History

Repository files navigation

nyc-yellow-trip-data-pipeline

Phase 1

Download and Familiarize:

Phase 2

operations_and_performance

customer_demographics_and_preferences

financial_performance

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages