Skip to content

A beginner-friendly Exploratory Data Analysis (EDA) on the Titanic dataset. Includes summary statistics, histograms, boxplots, correlation heatmaps, and visual insights. Part of a job-ready data science task series.

License

Notifications You must be signed in to change notification settings

tirtha103/Titanic-EDA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Titanic EDA – Exploratory Data Analysis

Author: Tirtha Dutta
Date: 24 June 2025
Dataset: Kaggle – Yasser H Titanic Dataset


Objective

Explore the Titanic dataset using visual and statistical techniques to uncover relationships, trends, and anomalies.
This step helps lay the foundation for building accurate machine learning models.


Key EDA Steps Performed

  1. Summary statistics for all numerical features
  2. Histograms to visualize distributions
  3. Boxplots to inspect outliers and spread
  4. Correlation heatmap to check relationships
  5. Markdown cell summarizing key findings

How to run locally?

git clone https://github.com/tirtha103/titanic-eda.git cd titanic-eda

Create and activate virtual environment (Windows)

python -m venv venv venv\Scripts\activate

Install dependencies

pip install -r requirements.txt

Launch the notebook

jupyter lab notebooks/01_eda_walkthrough.ipynb


Folder Structure

titanic-eda/
│
├── data/
│   └── titanic_cleaned.csv                # Cleaned dataset (from Task 1)
│
├── images/
│   ├── histograms.png                     # Histograms of numeric features
│   ├── boxplots.png                       # Boxplots for outlier inspection
│   └── correlation_heatmap.png            # Correlation heatmap
│
├── notebooks/
│   └── 01_eda_walkthrough.ipynb           # Full EDA notebook
│
├── report/
│   └── titanic_eda_report.pdf             # Optional PDF report
│
├── requirements.txt                       # Python package requirements
├── LICENSE                                # MIT License
└── .gitignore                             # Ignored files                           



About

A beginner-friendly Exploratory Data Analysis (EDA) on the Titanic dataset. Includes summary statistics, histograms, boxplots, correlation heatmaps, and visual insights. Part of a job-ready data science task series.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published