Skip to content

Gray135/movie-data-insights

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

How to Read this Repo

  • Start here: Scroll to the Dashboard Preview to see the Power BI visuals.
  • SQL logic: All queries are in the sql-queries/ folder.
  • Access the Dashboard: Download the PBIX file (click the link, then “Raw” to download).
  • Quick skim: - README.md – walks through objectives, queries, and insights step by step.

🎬 Movie Industry ROI Analysis

Using SQL, Excel, and Power BI to Uncover What Drives Film ROI

This project analyzes a curated sample of the top 10 ROI-performing films drawn from a cleaned dataset of 600+ titles (1997–2016). The aim is to simulate how a customer-insights / business analyst would respond to an executive ask: “What traits consistently show up in our biggest wins?”

Objective: identify repeatable signals—genre, director, budget, runtime, production company—that correlate with outsized ROI and can inform smarter green-light decisions.

Tools & approach

  • SQL: data cleaning, joins, views to shape analysis tables
  • Excel: validation checks, calculated fields, quick sanity tests
  • Power BI: data model, DAX measures, interactive visuals & slicers

Dashboard pages

  • Overview: ROI performance at a glance (films, directors, actors, studios)
  • Detailed ROI: top performers with drill-downs
  • Trends Over Time: ROI & revenue patterns by release year and runtime
  • Genre Analysis: category-level ROI comparisons
  • Findings: concise recommendations supported by the data
  • Behind the Dashboard: build notes and assumptions

Outcome (high level): low-budget categories—especially horror/documentary—frequently outperform on ROI; select studios and directors recur among top performers, suggesting targeted bets can amplify returns.


Table of Contents


Dashboard Preview

This dashboard analyzes ROI trends across 600+ films released between 1997–2016. Results highlight how low-budget horror and documentary films consistently outperform big-budget productions, with Blumhouse Productions, Paranormal Activity, and director Oren Peli dominating ROI performance. The findings reinforce how lean production strategies often deliver the strongest financial returns.

🔑 Key Metrics

Metric Value
ROI Formula (Revenue – Budget) / Budget
# of Titles Analyzed ~600 films (cleaned dataset)
Time Window 1997–2016 releases
Top ROI Genre Horror (~3.1 ROI)
Lowest ROI Genre Crime (~1.4 ROI)
Top Production Company Blumhouse Productions (3.2K+ ROI)
Top Director Oren Peli (12.9K ROI – Paranormal Activity)
Top Actor (Avg ROI) Michael C. (4.1K ROI – Paranormal Activity)
Most Profitable Film Paranormal Activity (12.9K ROI)
Avg ROI (All Films) ~7.4
Avg Runtime (Top 10) ~100–110 minutes
  1. Movie ROI Dashboard

Movie ROI Dashboard

High-level dashboard summarizing ROI across films, directors, actors, and production companies.

  1. Top 10 Films: Balancing Revenue, Budget & ROI

Top 10 Films: Balancing Revenue, Budget & ROI

Compares the ten most profitable films by ROI, emphasizing patterns across production studios and genres.

  1. Highest ROI Films: Studios, Directors & Actors

Highest ROI FilmsL Studios, Directors & Actors

Lists top individual films by ROI, showing how modest budgets can drive extraordinary returns.

  1. Top Performers by Average ROI

Top Performers by Average ROI

Highlights low-budget horror’s dominance, with Blumhouse, Oren Peli, and Paranormal Activity leading ROI.

  1. Revenue & ROI by Release Year & Runtime

Revenue & ROI by Release Year and Runtime

Explores yearly ROI patterns and correlations with average film lengths.

  1. Revenue & ROI Trends Over Time & Runtime

Revenue & ROI Trends Over Time & Runtime

Shows how film runtimes and release years impact ROI trends, highlighting shifts over decades.

  1. Genre Benchmarks: Budget, Revenue & ROI

Genre Benchmarks: Budget, Revenue & ROI

Reveals ROI distribution by genre, with horror and documentary outperforming big-budget action and drama.

  1. What Revenue vs ROI Analysis Reveals About Film Success

What Revenue vs ROI Analysis Reveals About Film Success

Summarizes strategies used by top-performing films and studios to maximize ROI despite budget constraints.

  1. Behind the Dashboard

Behind the Dashboard

Documents project build: Power BI, Power Query, SQL, and DAX applied to simulate a real-world analytics request.

images


Business Objective

Studios invest millions into film production with no guarantee of return. This project examines a targeted subset of the top 10 ROI performers from a larger dataset of 600+ films (1997–2016) to answer key business questions:

  • What traits define the most financially successful films?
  • Are there repeatable patterns by genre, director, runtime, or studio?
  • Do low-budget films reliably outperform high-budget ones?
  • Which studios or production approaches consistently deliver outsized returns?

The analysis is framed as if an executive asked:

“We’ve had a few huge wins — how do we find more like those?”

The project reflects how an analyst would approach the request:

  • Clean and prepare raw data (SQL, Excel)
  • Analyze and surface findings (SQL queries, descriptive analysis)
  • Communicate visually with interactive dashboards (Power BI)

Outcome: a multi-page dashboard highlighting repeatable traits of high-ROI films, built to simulate a real-world executive briefing.


Key Insights from the Analysis

Top Grossing Films (Revenue Leaders)

  • Proven directors dominate revenue: James Cameron, Joss Whedon, and the Russo brothers appear multiple times among the all-time box office leaders.
  • Franchises and recurring IP drive scale: Marvel and Disney consistently leverage shared universes, sequels, and bankable actors to maximize global box office returns.
  • Animation delivers high average revenue: Pixar and Disney titles show strong monetization potential, even if they don’t top ROI charts.

Top ROI Films (Efficiency Leaders)

  • Micro-budgets deliver massive ROI: Paranormal Activity achieved ~12,900% ROI on a ~$20K budget, far outpacing blockbuster tentpoles.
  • Horror and documentary dominate efficiency: Low-cost genres generated the highest ROI, while crime films consistently underperformed.
  • Blumhouse shows a repeatable ROI model: The studio’s lean production approach demonstrates how ROI can be engineered.
  • Star power not required: Unknown actors (e.g., Michael C. in Paranormal Activity) led top ROI films, showing concept and execution matter more than celebrity.
  • Runtime sweet spot ~100–110 minutes: Top ROI titles cluster around this range, suggesting concise runtimes optimize cost and engagement.

Recommendations

  • Replicate micro-budget horror/doc models(Blumhouse-style).
  • Target runtimes around 100-110 minutes.
  • Use franchises/IP for revenue scale, pair with lean originals for ROI.
  • Consider animation as a reliable high revenue genre.

Data Source

This project uses a modified version of the publicly available movie dataset from Kaggle: https://www.kaggle.com/datasets/utkarshx27/movies-dataset

The dataset includes information on genre, revenue, budget, production companies, and more.
It has been cleaned and structured for educational and analytical purposes in this project.


Data Preparation and Cleaning

  • Completed missing financials: Filled in missing budget and revenue values using external sources (IMDb, Box Office Mojo) to ensure accurate ROI calculations.
  • Standardized formats: Converted release dates into a consistent year format for trend analysis.
  • Filtered invalid records: Removed films with zero revenue and excluded entries with invalid or missing genre labels.
  • Normalized categorical fields: Cleaned and standardized genre and production company names to resolve duplicates and inconsistencies (e.g., “Universal Pictures” vs. “Universal”).

How Metrics Were Calculated

  • ROI formula: (Revenue – Budget) ÷ Budget
  • Sample size: ~600 films (1997–2016)
  • Top 10 ROI sample: Used for efficiency analysis; focuses on extremes (survivorship bias acknowledged).
  • Revenue leaders: Ranked by global box office revenue.

SQL Queries

  1. Top 10 Performing Movies
    Returns the 10 highest-ROI films and their key contributors (director, studio, lead actor), forming the basis for the Top ROI Films analysis.

  2. Budget to Revenue Ratio
    Calculates ROI for all films in the dataset using the formula (Revenue – Budget) / Budget, enabling dataset-wide comparisons.

  3. Genre Summary
    Aggregates total and average revenue, budget, and ROI by genre to highlight which categories consistently outperform or underperform.

  4. Compare to Genre Averages
    Benchmarks each film’s performance against its genre’s average ROI, surfacing outliers that dramatically over- or under-deliver.

  5. Top Productions by Company
    Identifies studios and production companies most frequently linked to high-ROI titles (e.g., Blumhouse), demonstrating repeatable business models.

  6. Top 10 Movies vs Average Metrics
    Compares top-performing films to dataset averages in runtime, budget, and revenue, uncovering differences that explain ROI outperformance.

  7. Release Year + Runtime Overview
    Analyzes how release year and runtime correlate with ROI, supporting insights into time-period trends and optimal film length.

  8. Updated View for Movie Dataset
    Consolidates cleaned budget, revenue, ROI, runtime, genre, and production company fields into a single view for use in Power BI dashboards.


Exploratory Data Analysis (EDA)

Before building the dashboard, I ran exploratory checks to understand patterns in the data. These visuals highlight how ROI is distributed, how it varies by genre, and how budget size relates to financial efficiency.

ROI Distribution

ROI Histogram
The majority of films generate modest ROI (0–5x budget). Only a few reach extreme levels above 100x, showing how rare runaway hits are.

ROI by Genre

ROI by Genre
Horror and Documentary films show higher ROI potential, while genres such as Crime underperform. Outliers were capped to keep the chart readable.

Budget vs ROI

Budget vs ROI Scatter
Smaller budgets cluster at higher ROI, while large-budget films are spread across the full range — confirming that blockbuster spending does not guarantee profitability.

These exploratory findings informed the dashboard design and final recommendations.


Real-World Role Alignment

This project simulates the type of analysis commonly performed in roles such as:

  • Customer Experience / Voice of Customer (CX / VoC) Analysts – spotting success patterns in customer or content outcomes and translating them into actionable strategies.
  • Business Analysts – supporting investment decisions with evidence from structured data (e.g., budget, ROI, performance trends).
  • Media / Content Analysts – analyzing creative, genre, and production data to recommend high-yield strategies for future projects.

Key skills demonstrated in this project:

  • Data preparation & cleaning – filled gaps, standardized formats, normalized categories for analysis-ready data.
  • Exploratory SQL analysis – uncovered top ROI performers, genre trends, and production patterns using targeted queries and a consolidated view.
  • Insight communication – transformed raw results into clear, scoped takeaways aligned to executive-style questions.
  • Dashboard design (Power BI) – built multi-page, interactive reporting with slicers, genre filters, and drill-downs.
  • Data modeling (Power Query & DAX) – merged multiple tables, defined ROI measures, and applied filtering logic for consistent metrics.

Folder Structure


Project Progress and Next Steps

Completed to date

  • SQL analysis delivered through 7 structured queries plus a consolidated reporting view.
  • Multi-page Power BI dashboard built to visualize ROI patterns by genre, director, studio, and runtime.
  • Data preparation and cleaning steps documented for analysis reliability.

Planned improvements

  • Share findings publicly in a LinkedIn post to demonstrate professional impact.
  • Add a few new DAX measures to deepen ROI tracking and filtering options.
  • Refresh the dataset with films released through 2025, including streaming and international box office, for broader comparisons.

Reflection

This project helped me practice taking a messy dataset and turning it into a story for a business audience. I strengthened my skills in SQL joins, Power BI modeling, and building multi-page dashboards.
I also learned the importance of documenting cleaning steps so others can reproduce the work.
While I focused my insights on the Top 10 ROI films for clarity, I also built a full dataset view (~600 films) to validate patterns and avoid relying only on outliers.
If I continued this project, I would add regression analysis to measure which variables most strongly predict ROI.
Overall, this project gave me confidence in connecting data preparation, analysis, and communication into one complete case study.


Contact

Aaron Zeug
Customer Experience & Reporting Specialist
Building a career in Data Analytics with a focus on clear, insight-driven storytelling.
8+ years in CX and quality, now applying SQL, Excel, and Power BI to real-world analysis projects.
GitHub ProfileLinkedIn


Disclaimer

This project is for educational and portfolio purposes only.
The analysis is based on publicly available data and reflects exploratory insights only.
It does not constitute financial, investment, or strategic business advice.
All interpretations are illustrative and intended to demonstrate technical and analytical skills.
Any actions taken based on this material are at the user’s own discretion.

About

SQL analysis and Power BI dashboard on top-performing movies

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published