Movies-ETL

Extract, Load, and Transform (ELT)

Background

Amazing Prime loves the dataset and wants to keep it updated on a daily basis. Amazing Prime needs to create an automated pipeline that takes in new data, performs the appropriate transformations, and loads the data into existing tables.

Main aim of the project

Create an automated pipeline for Wikipedia data, Kaggle metadata, and the MovieLens rating data—and performs the ETL process by adding the data to a PostgreSQL database.

The workflow

 1: Write an ETL Function to Read Three Data Files
      Using or knowledge of Python, Pandas, the ETL process, and code refactoring, write a function that reads in the three data files and creates three separate DataFrames.
 2: Extract and Transform the Wikipedia Data
      Using our knowledge of Python, Pandas, the ETL process, and code refactoring, extract and transform the Wikipedia data so you can merge it with the Kaggle metadata. While extracting the IMDb IDs using a regular expression string and dropping duplicates
 3: Extract and Transform the Kaggle data
      Using our knowledge of Python, Pandas, the ETL process, and code refactoring, extract and transform the Kaggle metadata and MovieLens rating data, then convert the transformed data into separate DataFrames. Then, merge the Kaggle metadata DataFrame with the Wikipedia movies DataFrame to create the movies_df DataFrame. Finally, merge the MovieLens rating data DataFrame with the movies_df DataFrame to create the movies_with_ratings_df.
 4: Create the Movie Database
     Use our knowledge of Python, Pandas, the ETL process, code refactoring, and PostgreSQL to add the movies_df DataFrame and MovieLens rating CSV data to a SQL database.

Summary

We extracted really messy and almost unusable data, combed through it carefully to transform it, and then loaded it into a SQL database. Now the data analysing team has a reliable, clean dataset just begging to be analyzed.

Programming language and Softwares

Python, SQL, PgAdmin, Postgres

Data resoureces

wiki_movie_data, kaggle_metadata, ratings

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
Resources		Resources
.DS_Store		.DS_Store
.gitignore		.gitignore
ETL_clean_kaggle_data.ipynb.ipynb		ETL_clean_kaggle_data.ipynb.ipynb
ETL_clean_wiki_movies.ipynb		ETL_clean_wiki_movies.ipynb
ETL_create_database.ipynb		ETL_create_database.ipynb
ETL_function_test.ipynb		ETL_function_test.ipynb
Module8.ipynb		Module8.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Movies-ETL

Extract, Load, and Transform (ELT)

Background

Main aim of the project

The workflow

Summary

Programming language and Softwares

Data resoureces

About

Uh oh!

Releases

Packages

Languages

Rutgers-Data-Science-Bootcamp/Movies-ETL

Folders and files

Latest commit

History

Repository files navigation

Movies-ETL

Extract, Load, and Transform (ELT)

Background

Main aim of the project

The workflow

Summary

Programming language and Softwares

Data resoureces

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages