The task is to explore the dataset and create a report using Jupyter.
Everyone loves "The Office," a popular show that aired from 2005 to 2013. While doing research, I stumbled across this dataset, with the lines of all the episodes. I decided to explore the dataset and answer some questions in a Jupyter notebook using Natural Language Processing.
In this notebook the following questions are answered:
- How many characters are there? What are their names?
- For each character, find out who has the most lines across all episodes
- What is the average of words per line for each character?
- What is the most common word per character
- Number of episodes where the character does not have a line, for each character
- Number of time "That's what she said" joke comes up
- Include five examples of the joke
- The average percent of lines each character contributed each episode per season.
- What is the most common word used in the show?
- What is the total number of scenes per episode and season?
- What is the total line contribution percentage of each character?