An exploratory and predictive analysis project using the Kaggle Titanic: Machine Learning from Disaster dataset, implemented in a Jupyter Notebook.
This project explores and models passenger survival using data from the RMS Titanic disaster. It covers:
- Data cleaning and exploratory data analysis (EDA)
- Feature engineering, visualization, and hypothesis testing
- Building and evaluating machine learning models to predict survival
- Identify key factors influencing survival (e.g., age, sex, class, embarkation)
- Visualize patterns across demographic and socio-economic variables
- Train predictive models to classify survival outcomes
- Statistically validate relationships using tests such as Chi-square
- Python libraries:
pandas
,numpy
,matplotlib
,seaborn
,scipy
- Modeling frameworks: classifier algorithms like Random Forest, Logistic Regression
- Jupyter Notebook (
PROJECT.ipynb
)
Titanic_Data_Analytics/
│
├── PROJECT.ipynb # Main notebook with full code and narrative
├── README.md # This document
└── data/ # Data folder (optional; dataset imported from Kaggle)
- Cleaning and preprocessing: Address missing values, data types, and outliers
- Exploratory plots: Analyze variables like age, sex, class, and survival via visualizations
- Statistical testing: Use chi-square tests to assess relationships (e.g., age groups vs survival)
- Modeling: Train and evaluate classification models on passenger data
- First-class passengers and female passengers had notably higher survival rates.
- Statistical tests confirmed age and passenger class significantly affect survival probability.
- Visualizations illustrated demographic patterns clearly.
- Predictive models achieved competitive accuracy in classifying survival.
- Feature engineering with additional derived variables (e.g., titles from names, family size)
- Model tuning and ensembling for improved predictions
- Deployment as a web app for interactive user input and prediction
- Kaggle Titanic: Machine Learning from Disaster dataset
- Tutorials and walkthroughs on Titanic EDA and ML techniques (e.g., Analytics Vidhya, DataQuest)
Shashwat Srivastava
BCA Student – SRM University
GitHub: Shashwat970
Notebook: PROJECT.ipynb