Hi! This project is based on a real interview exercise I received for a Security Analyst role at Arkose Labs. It’s a short but meaningful data challenge that asked me to analyze two IMDb datasets from 2018 and answer some clear questions about comedy films.
The goal wasn't just to extract numbers — it was about thinking critically, approaching real-world data, and drawing clear insights.
- Checked how many comedy movies were released in 2018
- Found how many of them had a rating of 8.0 or higher
- Identified the highest-rated comedy movie
- Analyzed whether users tend to prefer short or long movies (based on ratings)
The analysis was done using simple, readable Python – no overengineering.
analysis_step1_comedy_films.py
– Filters 2018 comedy moviesanalysis_step2_high_rated.py
– Finds comedy movies rated 8.0+analysis_step3_top_rated_titles.py
– Identifies the top-rated titleanalysis_step4_length_vs_rating.py
– Compares movie length vs. rating
notes_step1_comedy.md
notes_step2_high_rated.md
notes_step3_top_rated.md
notes_step4_length_vs_rating.md
title_basics_2018.csv
– IMDb base metadatatitle_ratings.csv
– IMDb ratings
Arkose Labs SOC - Security Analyst Interview Exercise.docx
README.md
I decided to share this publicly because it shows:
- How I approach small analytical tasks
- How I work with real data
- And how I communicate insights clearly and simply
I don’t come from a traditional data background — but as a cybersecurity analyst, I constantly investigate, connect dots, and find meaning in raw information.