Data Analysis of best selling books dataset

I performed a comprehensive data analysis of the best-selling books dataset (a dataset from Kaggle) to extract meaningful insights and inform data-driven decisions. Sharing insights derived from the data analysis:

Tools used:

Jupyter Notebook (Code and Markdown)
Draw.io (To create flowchart)

Summary of the project

Data cleaning: Performed data cleaning, handling missing values, and addressing data anomalies to prepare the dataset for analysis.
Exploratory Data Analysis (EDA):
- Performed preliminary data analysis to discover valuable information regarding book sales patterns.
- Used data visualization methods to identify essential attributes, including book title, author, genre, publication year, and metrics related to sales.
Top-selling authors : Identified the authors with the highest number of best-selling books. J.K.Rowling emerged as the top-selling author with 500 million sales of his books.
Top-selling books: Identified the books with the most extensive record of best-sellers. "A Tale of Two Cities" stood out as the top performer, boasting 200 million in sales.
Genre Analysis: Determined which genre performs better in terms of sales. The Fantasy genre outperforms other genres in terms of sales.
Sales distribution by language: Investigated the distribution of sales in percentage by language. The books categorized under the English language consistently achieve the highest sales figures.
Sales Analysis: Explored the metrics related to sales, such as total sales by book and genre.
Correlation Analysis: Examined correlations between Sales and Year. The approximate sales range falls between 22 and 50 million units. Sales remained consistently high from 1950 to 2000.
Visualization: Used data visualization techniques, such as bar charts, histograms, scatter plots, and pie charts, to present findings effectively.

Python functions and features

I used the following Python functions and features for data analysis:

Pandas: For data manipulation and cleaning. The key functions include read_csv (for loading the dataset), head (for viewing the first few rows), and functions for filtering, aggregating, and transforming data.
NumPy: For calculations and statistical analysis of the data.
Matplotlib and Seaborn: For data visualization. Created various types of plots, such as bar charts, histograms, scatter plots, pie charts, and line charts to visualize trends and patterns in the data.
Regular Expressions (re module): For removing multiple random characters from book titles.
Apply Functions (Pandas): For applying custom functions to the Book column.
GroupBy (Pandas): For aggregating and summarizing data, such as finding the total sales per genre or author.

Documentation

I used technical writing principles to document data analysis steps and explain the project workflow.

Code documentation: Explained the Python code so that others can reproduce and use the code when creating a data-analysis project. I included comments at relevant points within the code to clarify the rationale behind the logic.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Data Analysis of best-selling books dataset.ipynb		Data Analysis of best-selling books dataset.ipynb
Data Analysis of best-selling books dataset.md		Data Analysis of best-selling books dataset.md
Data Analysis of best-selling books dataset.pdf		Data Analysis of best-selling books dataset.pdf
Data Analysis of best-selling books dataset.slides.html		Data Analysis of best-selling books dataset.slides.html
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Data Analysis of best selling books dataset

Summary of the project

Python functions and features

Documentation

About

Uh oh!

Releases

Packages

Languages

kalpanapathak16/Data-Analysis-of-best-selling-books-dataset

Folders and files

Latest commit

History

Repository files navigation

Data Analysis of best selling books dataset

Summary of the project

Python functions and features

Documentation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages