A repository showcasing advanced SQL queries to solve 15 business problems using a Netflix dataset, covering content trends, ratings, genres, and more.
This project utilizes a comprehensive dataset of Netflix’s movies and TV shows, sourced from Kaggle and covering data up until 2021. Using advanced SQL queries, the objective is to extract valuable insights and answer key business questions related to content trends, ratings, viewer preferences, and other factors. By analyzing this dataset, the project aims to provide actionable insights that can inform content strategy and improve the user experience. The following README details the project’s goals, business problems, solutions, findings, and conclusions.
All queries are available in the "All Solutions.SQL" file.
- Analyze the distribution of content types by comparing the number of movies versus TV shows.
- Identify the most prevalent ratings for both movies and TV shows.
- Examine and analyze content based on factors such as release years, countries of origin, and durations.
- Classify and categorize content according to specific criteria and keywords, uncovering unique patterns.
Kaggle Link: https://www.kaggle.com/datasets/shivamb/netflix-shows?resource=download
- Identified cases where movie durations were mistakenly placed in the rating column, leaving the duration column NULL.
- Corrected anomalies by moving durations to the appropriate column and updating incorrect ratings (e.g., ‘74 min’ → TV-MA).
- Ratings: Replaced NULL values with ‘Unknown’.
- Director Names: Updated missing values to ‘Unknown’ to maintain data integrity.
- Separated Movies and TV Shows into two tables:
- Movies: Contains duration in minutes (converted to TIME format).
- Shows: Stores seasons instead of minutes.
- A separate
genre
table was created to store unique genres, withgenre_id
as the primary key. - Junction tables,
movie_genres
andshow_genres
, were created to establish many-to-many relationships between media and genres.
- Unique genres were extracted from the
listed_in
column of bothmovies
andshows
tables. These were inserted into thegenre
table after trimming unnecessary spaces. - The
movie_genres
andshow_genres
tables were populated by mappingmovie_id
andshow_id
to their correspondinggenre_id
.
- A unified media view was generated, merging
movies
andshows
while maintaining a common structure. - A query was designed to retrieve all media records, including relevant attributes but excluding duration for better alignment across both categories.
- Countries were extracted as separate entities, and each media entry was assigned its corresponding country, facilitating location-based analysis.
- The duration attribute was split into two separate fields:
duration_in_min
for moviesseasons
for shows
- Converted
duration
in movies from VARCHAR to TIME format. - Converted
seasons
in shows from text to INTEGER. - Formatted
date_added
column from text to DATE format to allow proper time-based analysis.
- Database: PostgreSQL, Microsoft Excel
- Query Tool: pgAdmin
- Version Control: Git & GitHub
- Data Handling: SQL Queries
- Data Cleaning & Formatting: Converting text to lists, trimming spaces, changing data types.
- Data Analysis: Counting, grouping, filtering, and ranking content.
- Performance Optimization: Using CTEs (
WITH
statements) for better query readability and efficiency. - Combining Data: Merging results from different tables using
UNION ALL
.
- What types of content are available? Identify all unique types of content available on Netflix.

- Count the number of Movies vs TV Shows. Compare the distribution of Movies and TV Shows in the dataset.

- Determine the rating that appears most frequently for each content type.


- Retrieve a list of movies based on the year they were released.

- Identify which countries contribute the most content to Netflix.

- Locate the movie with the longest runtime on Netflix.

- Query to discover content added to Netflix within the most recent 5 years.

- Retrieve all content directed by Rajiv Chilaka.

- Identify TV Shows that have a runtime exceeding five seasons.

- Group content by genre and calculate the total for each.

- Analyze and return the years with the highest average Netflix releases in India.

- Query to identify movies categorized as documentaries.

- Identify records missing a director’s name in the dataset.

- Retrieve the total number of movies featuring Salman Khan in the last decade.

- Determine the most frequent collaborators in Indian Netflix content.

- Label content containing the keywords as “Bad” and all others as “Good.” Count how many items fall into each category.

- Netflix primarily features movies, with 6,131 movies compared to 2,676 TV shows.
- Movies account for nearly 70% of the content, suggesting a stronger focus on films rather than long-format series.
- The most common rating for both movies and TV shows is TV-MA, with 3,210 titles, indicating that Netflix leans heavily toward mature audience content.
- Other common ratings include:
- TV-14: 2,160 titles
- TV-PG: 863 titles
- R: 799 titles
- PG-13: 490 titles
- Family-friendly content like TV-Y (307) and TV-G (220) has a smaller presence.
- The U.S. dominates Netflix content, contributing 3,690 titles, nearly 3.5 times more than the next country, India (1,046 titles).
- Other significant contributors include:
- United Kingdom: 806 titles
- Canada: 445 titles
- France: 393 titles
- This indicates Netflix’s strong English-language content focus while also expanding in global markets like India and Europe.
- The longest movie, "Black Mirror: Bandersnatch," runs for 312 minutes, reflecting Netflix’s experimentation with interactive storytelling.
- Other long movies:
- "Headspace: Unwind Your Mind" – 273 minutes
- "The School of Mischief" – 253 minutes
- "No Longer Kids" – 237 minutes
- "Lock Your Girls In" – 233 minutes
- 6,030 movies and 2,552 TV shows were added in the last five years, reinforcing Netflix’s aggressive content expansion strategy.
- Movies continue to outpace TV shows in new additions, aligning with Netflix’s licensing and production model.
- The top five genres are:
- International Movies – 2,752 titles
- Dramas – 2,427 titles
- Comedies – 1,674 titles
- International TV Shows – 1,351 titles
- Documentaries – 869 titles
- This suggests Netflix’s strong focus on global storytelling and factual content.
- The top five years with the highest average content releases in India were:
- 2017 – 98 movies
- 2018 – 81 movies
- 2019 – 74 movies
- 2020 – 59 movies
- 2021 – 22 movies
- TV Shows followed a similar trend, with peaks in 2018 and 2019 before slowing down.
- Top 10 Bollywood actors with the most movies on Netflix:
- Anupam Kher – 40 movies
- Shah Rukh Khan – 34 movies
- Naseeruddin Shah – 31 movies
- Akshay Kumar – 29 movies
- Om Puri – 29 movies
- Amitabh Bachchan – 28 movies
- Paresh Rawal – 28 movies
- Boman Irani – 27 movies
- Kareena Kapoor – 25 movies
- Ajay Devgn – 21 movies
- Netflix’s catalog is dominated by movies (70%), with a primary focus on mature-rated content (TV-MA).
- The U.S. leads content production, but India and the U.K. are key international markets.
- Dramas, international movies, and comedies are the most popular genres, showcasing Netflix’s efforts to balance entertainment, global storytelling, and factual content.
- Peak content additions in India were in 2017-2019, reflecting Netflix’s strategy to grow in regional markets.
- The presence of Bollywood superstars suggests that Netflix is strategically using established talent to capture the Indian audience.
- Further data collection is required, as nearly 2,600 movies and TV shows are missing director information, which may hinder the accuracy of our analysis.
- The data is only available up to 2021, so the analysis may not reflect current trends.
An interactive dashboard in Tableau to support this analysis will be uploaded shortly.