Exploring "The Office" TV Series Dataset

Objective

The task is to explore the dataset and create a report using Jupyter.

Brief

Everyone loves "The Office," a popular show that aired from 2005 to 2013. While doing research, I stumbled across this dataset, with the lines of all the episodes. I decided to explore the dataset and answer some questions in a Jupyter notebook using Natural Language Processing.

Tasks

In this notebook the following questions are answered:

How many characters are there? What are their names?
For each character, find out who has the most lines across all episodes
What is the average of words per line for each character?
What is the most common word per character
Number of episodes where the character does not have a line, for each character
Number of time "That's what she said" joke comes up
- Include five examples of the joke
The average percent of lines each character contributed each episode per season.
What is the most common word used in the show?
What is the total number of scenes per episode and season?
What is the total line contribution percentage of each character?

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md
TheOffice.ipynb		TheOffice.ipynb
stopwords.json		stopwords.json
the_office_lines_scripts.csv		the_office_lines_scripts.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Exploring "The Office" TV Series Dataset

Objective

Brief

Tasks

About

Uh oh!

Releases

Packages

Languages

NikaRasoolzadeh/TheOffice

Folders and files

Latest commit

History

Repository files navigation

Exploring "The Office" TV Series Dataset

Objective

Brief

Tasks

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages