Skip to content

This project analyzes Goodreads bestseller data to uncover trends in book ratings, reviews, pricing, and genre performance. Using Excel for preprocessing and Tableau for visualization, the analysis categorizes books by rating tiers and explores key patterns to inform publishing and marketing decisions.

Notifications You must be signed in to change notification settings

oyecrafts/Exploratory-Data-Analysis-of-Goodreads-Bestsellers

Repository files navigation

Exploratory Data Analysis of Goodreads Bestsellers

Introduction

This dataset provides valuable insights into the ratings distribution of bestselling books across different categories. With a meticulous categorization of bestsellers based on their user ratings, this dataset offers a comprehensive overview of the popularity and reception of top-selling books. Whether you're interested in exploring highly-rated bestsellers, very highly-rated bestsellers, or moderately rated bestsellers, this dataset empowers you to analyze trends and patterns in the literary world. Leveraging this dataset opens up opportunities for market research, trend analysis, and strategic decision-making for publishers, authors, and book enthusiasts alike.

What questions were asked

  • What is the distribution of bestseller ratings among the top-selling books?
  • How many books fall into each category of bestseller ratings (e.g., very highly rated, highly rated, moderately rated)?
  • Which genres tend to have the highest-rated bestsellers?
  • Are there any trends or patterns in the ratings of bestsellers over time?
  • What are the characteristics of highly-rated bestsellers compared to moderately-rated ones?
  • How do the prices of bestsellers correlate with their ratings?
  • Can we identify any outliers or anomalies in the dataset that may require further investigation?
  • Are there any authors who consistently produce highly-rated bestsellers?
  • How does the number of reviews correlate with the user ratings of bestsellers?
  • What insights can be gained from comparing the ratings breakdowns across different years or time periods?

What were the tasks completed?

  • Data Cleaning and Manipulation in Excel: Conducted data cleaning and manipulation tasks such as removing duplicates, handling missing values, and formatting data for analysis in Excel.
  • Data Collection from Kaggle: Gathered the initial dataset containing information about bestselling books from Kaggle, a popular platform for datasets.
  • Visualization in Tableau: Created interactive visualizations of the dataset using Tableau, a powerful data visualization tool, to explore and analyze bestseller ratings breakdowns.
  • Reporting on Google Docs: Generated reports and summaries of the findings using Google Docs, a collaborative document editing platform, to communicate insights effectively.

Overview of the Dataset

The Goodreads books dataset is a comprehensive collection of book-related data obtained from the Goodreads API. It includes information on thousands of books, covering a wide range of genres, authors, and publication years. The dataset provides insights into the popularity, ratings, and reviews of these books, allowing for in-depth exploration and analysis.

Description of Variables/Features

  • Name: The title of the book.
  • Author: The author(s) of the book.
  • User Rating: The average rating given to the book by users on Goodreads.
  • Reviews: The total number of reviews the book has received.
  • Price: The price of the book.
  • Year: The year in which the book was published.
  • Genre: The genre/category of the book.

Bestseller Rating:

A newly created column categorizing books based on their user ratings and reviews, indicating whether they are moderately rated bestsellers, highly rated bestsellers, or very highly rated bestsellers.

Bestseller Rating Criteria:

  • Moderately Rated Bestsellers: Books with a user rating of 3.9 or below and at least 29,280 reviews.
  • Highly Rated Bestsellers: Books with a user rating between 4.0 and 4.9, inclusive, and between 29,281 and 58,560 reviews.
  • Very Highly Rated Bestsellers: Books with a user rating of 4.0 or above and at least 58,561 reviews. This new column provides additional insight into the popularity and reception of each book within the dataset, categorizing them based on their user ratings and reviews.

Data Visualization

Visualizations of the dataset have been created using Tableau. For interactive visualizations and further exploration of the dataset, please visit the Tableau Dashboard.

License

The dataset is publicly available on Kaggle, and while it is shared for the benefit of the data science community, the ownership and maintenance of the dataset lie with its creator. Please refer to the dataset creator's terms of use and licensing agreements for any specific restrictions or permissions regarding its use.

Feel free to reach out to the dataset creator for any further inquiries or clarifications regarding the dataset.

About

This project analyzes Goodreads bestseller data to uncover trends in book ratings, reviews, pricing, and genre performance. Using Excel for preprocessing and Tableau for visualization, the analysis categorizes books by rating tiers and explores key patterns to inform publishing and marketing decisions.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published