Skip to content

Manasasidd/Movie-Analysis-System

Repository files navigation

Movie Analysis System

Movie Analysis on AWS

This repository contains the code and configuration files for setting up a movie recommendation system on AWS.

Overview

The movie recommendation system is designed to recommend movies to users based on their preferences. It utilizes both structured and unstructured datasets, performs data merging, cleansing, and loading, and leverages AWS services such as S3, Lambda, RDS, Glue, and SNS.

Setup Instructions

Create two S3 buckets to store data

Note:- To distribute functionality, you need to create users and assign roles to them.

image

1. Structured Dataset

  • Download the structured dataset from MovieLens.
  • Using Python(movie_csv.ipynb) to insert the dataset into an S3 bucket named final-project-rawdata-group-5-sec-2.

2. Unstructured Dataset

  • Use Python(api_call_upload_S3.ipynb) to call the TMDB API and obtain the unstructured dataset.
  • Insert the dataset into the same S3 bucket (final-project-rawdata-group-5-sec-2).

image

3. Data Merging with AWS Lambda

  • Add the necessary layers in your lambda function(Amazon SDK)
  • Assign an IAM role with full access to S3 and use it in the Lambda function.
  • Create an AWS Lambda function(s3_upload_using_lambda) to merge and clean the datasets from the S3 bucket and load it into another S3 bucket(final-project-mergedata-group-5-sec-2).
  • Configure Lambda triggers to run the function every 30 days(rate(30 days) and on file upload.
  • Ensure proper naming conventions for files (api_movie_data.csv and movies.csv).
  • Utilize Python with Pandas and Numpy for data cleaning and merging.

image

image

image

In the below screenshot we can see that the data is successfully loaded in the S3 bucket, to confirm this we can check the time.

Note:- Lambda_function will replace the old file with the new one.

image

4. Database Setup

  • Create a Database instance in RDS. (make sure to make a database when creating instance)
  • Use .tosql() function in Python to create a table from the merged dataset.
  • Insert the table into an RDS instance.
  • Design the conceptual model of the database.
  • Identify dimensions and facts for the data warehouse schema.
  • Define primary keys, foreign keys, and relationships between tables.
  • Connected MySQL Workbench to AWS RDS instance
  • Created Dimensions and fact table using DDL(create_table_OLAP.sql) in RDS

image

Note:- You have to input your access key and aws secret key.

image

Let’s check whether the tables are inserted or not. The screenshot given below are the proof that the data is successfully inserted in the table

image

image image

5. AWS Glue Service(ETL)

  • Assign appropriate policies to a role and give that role to a user for ETL operations.

Note:- Below are the policies that you need to assign to a role and grant to the user for managing ETL

image

  • Create MySQL connection to the RDS instance in AWS Glue.

image

  • Use a crawler to map data to the RDS database and table instance.(use JDBC data source, proper IAM role, frequency as on demand)

image

  • Use Visual ETL to create two ETL jobs: dimension_job and fact_job to load data into dimensions and facts, respectively.(First create dimension_job)

image

image

  • Extract the data from movie_details database, transform it and load into Data warehouse (refer ETL using glue)

image

Note:- Below are the diagram which confirms that the data is successfully loaded into the database image

image

image

image

image

6. Email Notifications

  • Configure an SNS topic to send emails to team members. image

  • Create a subscription to you email

image

  • Select the topic you created and create a subscription and confirm the subscription

image

  • In your RDS instance create Event Subscription and choose the SNS topic you created earlier, then update your database to confirm the email

image

7. Visualization with Tableau

  • Connect the RDS database to Tableau for visualization.
  • Create visualizations to showcase relevant results.(Movie Analysis.twb)

Prerequisites

  • AWS account with appropriate permissions.
  • Python environment with Pandas and Numpy for Lambda function.
  • MySQL Workbench
  • Access to Tableau for visualization.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •