Analysis of movies from Serbia and Yugoslavia across history using Python (Pandas & Matplotlib). The goal is to explore trends, top-rated movies and directors, and the most popular genres by decade.
This project uses the IMDB Extensive Dataset from Kaggle.
Download the dataset and place the movie-level CSV in the project root as IMDB.csv
(or adjust the script path).
- Python 3
- Pandas (data processing)
- Matplotlib (visualization)
- Load & clean data
- Remove unused columns (budget, description, etc.)
- Filter movies from Serbia/Yugoslavia
- Convert numeric fields (year, ratings, votes)
- Director analysis
- Film count per director
- Average rating per director (for those with more than 3 films)
- Top 10 directors by average rating
- Decade analysis
- Assign a decade to each film
- Split multi-valued genres (films often have several)
- Build a pivot table: number of films per genre × decade
- Visualization
- Horizontal bar chart: Top 10 directors by rating
The analysis produces the following outputs:
-
Genre × Decade table → saved as
pivot_genre_by_decade.csv
.- Rows are decades (e.g., 1960, 1970, …), columns are genres (Drama, Comedy, …).
- Each cell is the count of films tagged with that genre in that decade.
- Note: a film with multiple genres contributes once to each of its listed genres.
-
Directors summary (printed to console) →
broj_po_reziseru3
table:director
— director namebroj_filmova
— number of filmsprosek_ocena
— average IMDB ratingbroj_glasova
— total votesnajraniji_film
/najkasniji_film
— earliest/latest film yearperiod_stvaranja
— active span (latest - earliest
)- Filtered to directors with > 3 films, sorted by
prosek_ocena
.
-
Top 10 Directors chart → displayed in a window:
- Horizontal bar chart of the Top 10 directors by average rating (after the “> 3 films” filter).
- You can optionally save it (e.g.,
outputs/top10_directors.png
).
-
Quick console sanity checks:
- Year range of the dataset (min/max).
- Top-rated titles (largest
avg_vote
). - Films with
avg_vote > 8.1
(sorted by rating, then votes).
Interpretation: Use
pivot_genre_by_decade.csv
to see which genres dominate each decade.
Use the directors summary and chart to spot consistently high-rated directors with a meaningful body of work.
pivot_genre_by_decade.csv
— saved automatically.- “Top 10 directors” chart — displayed (optionally saved as
outputs/top10_directors.png
).
- Place
IMDB.csv
next to your Python script (or notebook). - Install dependencies:
pip install pandas matplotlib
- Run the analysis (script version):
A CSV with the genre × decade pivot will be saved, and a chart of the Top 10 directors will be shown.
python imdb_analysis.py
- Uses a local
IMDB.csv
from the Kaggle dataset (not included). - Deduplicates by
imdb_title_id
. country
filter allows mixed entries (e.g., “Serbia, France”).- Genres are split by
,
; multi-genre films count once per genre. - Outputs: saves
pivot_genre_by_decade.csv
and shows a Top-10 directors chart.
IMDB.csv
imdb_analysis.py
outputs/ # optional (for saved CSV/PNG)
README.md
MIT (or your preferred license).