This project aims to analyze e-commerce data to derive meaningful insights about customer behavior, sales trends, and product performance. We utilize Python, MySQL, and various data visualization libraries to perform the analysis.
This project involves a comprehensive analysis of an e-commerce database using Python, Pandas, MySQL, and various data visualization tools like Matplotlib and Seaborn. The analysis includes customer demographics, order trends, sales performance, and product popularity among other insights.
- Installation
- Database Connection
- Analysis and Insights
- Unique Cities
- Orders in 2017
- Sales per Category
- Installment Payments
- Customers by State
- Orders per Month in 2018
- Average Products per Order by City
- Revenue Percentage per Category
- Correlation Between Product Price and Order Count
- Revenue by Seller
- Moving Average of Order Values
- Cumulative Sales
- Year-over-Year Growth
- Customer Retention Rate
- Top Customers by Year
- Conclusion
-
Clone the repository:
git clone https://github.com/yourusername/ecommerce-analysis.git
-
Navigate to the project directory:
cd ecommerce-analysis
-
Install the required dependencies:
pip install -r requirements.txt
Ensure you have MySQL installed and running. Update the database connection details in your script accordingly:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import mysql.connector
db = mysql.connector.connect(
host='localhost',
user='root',
password='Akshay@123',
database='ecomerce'
)
cur = db.cursor()
The data_ingestion.py script reads each CSV file, processes it, and inserts the data into the corresponding MySQL tables. The script handles various data types and ensures that NaN values are appropriately replaced with SQL NULL.
Here is a brief overview of the script:
import pandas as pd
import mysql.connector
import os
# Connect to the MySQL database
conn = mysql.connector.connect(
host='localhost',
user='root',
password='Akshay@123',
database='ecomerce'
)
cursor = conn.cursor()
# Folder containing the CSV files
folder_path = 'D:/Ecomerce project full data'
def get_sql_type(dtype):
if pd.api.types.is_integer_dtype(dtype):
return 'INT'
elif pd.api.types.is_float_dtype(dtype):
return 'FLOAT'
elif pd.api.types.is_bool_dtype(dtype):
return 'BOOLEAN'
elif pd.api.types.is_datetime64_any_dtype(dtype):
return 'DATETIME'
else:
return 'TEXT'
for csv_file, table_name in csv_files:
file_path = os.path.join(folder_path, csv_file)
# Read the CSV file into a pandas DataFrame
df = pd.read_csv(file_path)
# Replace NaN with None to handle SQL NULL
df = df.where(pd.notnull(df), None)
# Debugging: Check for NaN values
print(f"Processing {csv_file}")
print(f"NaN values before replacement:\n{df.isnull().sum()}\n")
# Clean column names
df.columns = [col.replace(' ', '_').replace('-', '_').replace('.', '_') for col in df.columns]
# Generate the CREATE TABLE statement with appropriate data types
columns = ', '.join([f'`{col}` {get_sql_type(df[col].dtype)}' for col in df.columns])
create_table_query = f'CREATE TABLE IF NOT EXISTS `{table_name}` ({columns})'
cursor.execute(create_table_query)
# Insert DataFrame data into the MySQL table
for _, row in df.iterrows():
# Convert row to tuple and handle NaN/None explicitly
values = tuple(None if pd.isna(x) else x for x in row)
sql = f"INSERT INTO `{table_name}` ({', '.join(['`' + col + '`' for col in df.columns])}) VALUES ({', '.join(['%s'] * len(row))})"
cursor.execute(sql, values)
# Commit the transaction for the current CSV file
conn.commit()
# Close the connection
conn.close()