Movie Analysis System

Movie Analysis on AWS

This repository contains the code and configuration files for setting up a movie recommendation system on AWS.

Overview

The movie recommendation system is designed to recommend movies to users based on their preferences. It utilizes both structured and unstructured datasets, performs data merging, cleansing, and loading, and leverages AWS services such as S3, Lambda, RDS, Glue, and SNS.

Setup Instructions

Create two S3 buckets to store data

Note:- To distribute functionality, you need to create users and assign roles to them.

1. Structured Dataset

Download the structured dataset from MovieLens.
Using Python(movie_csv.ipynb) to insert the dataset into an S3 bucket named final-project-rawdata-group-5-sec-2.

2. Unstructured Dataset

Use Python(api_call_upload_S3.ipynb) to call the TMDB API and obtain the unstructured dataset.
Insert the dataset into the same S3 bucket (final-project-rawdata-group-5-sec-2).

3. Data Merging with AWS Lambda

Add the necessary layers in your lambda function(Amazon SDK)
Assign an IAM role with full access to S3 and use it in the Lambda function.
Create an AWS Lambda function(s3_upload_using_lambda) to merge and clean the datasets from the S3 bucket and load it into another S3 bucket(final-project-mergedata-group-5-sec-2).
Configure Lambda triggers to run the function every 30 days(rate(30 days) and on file upload.
Ensure proper naming conventions for files (api_movie_data.csv and movies.csv).
Utilize Python with Pandas and Numpy for data cleaning and merging.

In the below screenshot we can see that the data is successfully loaded in the S3 bucket, to confirm this we can check the time.

Note:- Lambda_function will replace the old file with the new one.

4. Database Setup

Create a Database instance in RDS. (make sure to make a database when creating instance)
Use .tosql() function in Python to create a table from the merged dataset.
Insert the table into an RDS instance.
Design the conceptual model of the database.
Identify dimensions and facts for the data warehouse schema.
Define primary keys, foreign keys, and relationships between tables.
Connected MySQL Workbench to AWS RDS instance
Created Dimensions and fact table using DDL(create_table_OLAP.sql) in RDS

Note:- You have to input your access key and aws secret key.

Let’s check whether the tables are inserted or not. The screenshot given below are the proof that the data is successfully inserted in the table

5. AWS Glue Service(ETL)

Assign appropriate policies to a role and give that role to a user for ETL operations.

Note:- Below are the policies that you need to assign to a role and grant to the user for managing ETL

Create MySQL connection to the RDS instance in AWS Glue.

Use a crawler to map data to the RDS database and table instance.(use JDBC data source, proper IAM role, frequency as on demand)

Use Visual ETL to create two ETL jobs: dimension_job and fact_job to load data into dimensions and facts, respectively.(First create dimension_job)

Extract the data from movie_details database, transform it and load into Data warehouse (refer ETL using glue)

Note:- Below are the diagram which confirms that the data is successfully loaded into the database

6. Email Notifications

Configure an SNS topic to send emails to team members.
Create a subscription to you email

Select the topic you created and create a subscription and confirm the subscription

In your RDS instance create Event Subscription and choose the SNS topic you created earlier, then update your database to confirm the email

7. Visualization with Tableau

Connect the RDS database to Tableau for visualization.
Create visualizations to showcase relevant results.(Movie Analysis.twb)

Prerequisites

AWS account with appropriate permissions.
Python environment with Pandas and Numpy for Lambda function.
MySQL Workbench
Access to Tableau for visualization.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
0_AWS_Architecture		0_AWS_Architecture
1_datasource		1_datasource
2_Data_Merging_AWS_Lambda_to_s3		2_Data_Merging_AWS_Lambda_to_s3
3_Data_Profiling		3_Data_Profiling
4_Data_Model		4_Data_Model
5_OLTP		5_OLTP
6_Database_Setup		6_Database_Setup
7_Data_Visualization		7_Data_Visualization
8_IA_Project_Movie_Analysis_Presentation.pptx		8_IA_Project_Movie_Analysis_Presentation.pptx
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Movie Analysis System

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

Manasasidd/Movie-Analysis-System

Folders and files

Latest commit

History

Repository files navigation

Movie Analysis System

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages