Author: Tirtha Dutta
Date: 24 June 2025
Dataset: Kaggle – Yasser H Titanic Dataset
Explore the Titanic dataset using visual and statistical techniques to uncover relationships, trends, and anomalies.
This step helps lay the foundation for building accurate machine learning models.
- Summary statistics for all numerical features
- Histograms to visualize distributions
- Boxplots to inspect outliers and spread
- Correlation heatmap to check relationships
- Markdown cell summarizing key findings
git clone https://github.com/tirtha103/titanic-eda.git cd titanic-eda
python -m venv venv venv\Scripts\activate
pip install -r requirements.txt
jupyter lab notebooks/01_eda_walkthrough.ipynb
titanic-eda/
│
├── data/
│ └── titanic_cleaned.csv # Cleaned dataset (from Task 1)
│
├── images/
│ ├── histograms.png # Histograms of numeric features
│ ├── boxplots.png # Boxplots for outlier inspection
│ └── correlation_heatmap.png # Correlation heatmap
│
├── notebooks/
│ └── 01_eda_walkthrough.ipynb # Full EDA notebook
│
├── report/
│ └── titanic_eda_report.pdf # Optional PDF report
│
├── requirements.txt # Python package requirements
├── LICENSE # MIT License
└── .gitignore # Ignored files