Skip to content

Shashwat970/Titanic_Data_Analytics

Repository files navigation

🚢 Titanic Data Analytics

An exploratory and predictive analysis project using the Kaggle Titanic: Machine Learning from Disaster dataset, implemented in a Jupyter Notebook.


🧠 Overview

This project explores and models passenger survival using data from the RMS Titanic disaster. It covers:

  • Data cleaning and exploratory data analysis (EDA)
  • Feature engineering, visualization, and hypothesis testing
  • Building and evaluating machine learning models to predict survival

🧩 Objectives

  • Identify key factors influencing survival (e.g., age, sex, class, embarkation)
  • Visualize patterns across demographic and socio-economic variables
  • Train predictive models to classify survival outcomes
  • Statistically validate relationships using tests such as Chi-square

🚀 Tech Stack

  • Python libraries: pandas, numpy, matplotlib, seaborn, scipy
  • Modeling frameworks: classifier algorithms like Random Forest, Logistic Regression
  • Jupyter Notebook (PROJECT.ipynb)

📂 Repository Structure

Titanic_Data_Analytics/
│
├── PROJECT.ipynb         # Main notebook with full code and narrative
├── README.md             # This document
└── data/                 # Data folder (optional; dataset imported from Kaggle)

✅ Highlights

  • Cleaning and preprocessing: Address missing values, data types, and outliers
  • Exploratory plots: Analyze variables like age, sex, class, and survival via visualizations
  • Statistical testing: Use chi-square tests to assess relationships (e.g., age groups vs survival)
  • Modeling: Train and evaluate classification models on passenger data

📊 Insights & Results

  • First-class passengers and female passengers had notably higher survival rates.
  • Statistical tests confirmed age and passenger class significantly affect survival probability.
  • Visualizations illustrated demographic patterns clearly.
  • Predictive models achieved competitive accuracy in classifying survival.

📈 Future Enhancements

  • Feature engineering with additional derived variables (e.g., titles from names, family size)
  • Model tuning and ensembling for improved predictions
  • Deployment as a web app for interactive user input and prediction

🧾 Credits & References

  • Kaggle Titanic: Machine Learning from Disaster dataset
  • Tutorials and walkthroughs on Titanic EDA and ML techniques (e.g., Analytics Vidhya, DataQuest)

👤 Author

Shashwat Srivastava
BCA Student – SRM University
GitHub: Shashwat970
Notebook: PROJECT.ipynb

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published