Skip to content

Titanic survival prediction dataset using machine learning techniques, focusing on data preprocessing, visualization, and model selection to build accurate predictive models.

Notifications You must be signed in to change notification settings

SyedMaaz25/Titanic-Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

Titanic Classification

Overview

"This project analyzes the Titanic survival prediction dataset using machine learning techniques, focusing on data preprocessing, visualization, and model selection."

Dataset

  • Source: Kaggle (Titanic-Dataset)
  • Features: Passenger details (e.g., Sex, Age, Fare, Pclass)
  • Target: Survived (0 = No, 1 = Yes)

Methodology

  1. Load the Data: Imported Titanic dataset from Kaggle.
  2. Explore & Visualize: Analyzed distributions and relationships (e.g., Sex vs. Survival).
  3. Prepare Data: Handled missing values, encoded categorical variables, and scaled features.
  4. Model Selection: Compared SGDClassifier, LogisticRegression, and RandomForestClassifier.
  5. Conclusion: Selected tuned RandomForestClassifier (F1 = 0.7971).

Results

  • Best Model: RandomForestClassifier with class_weight='balanced', max_depth=10, min_samples_split=5, n_estimators=300.
  • Test F1-Score: 0.7971
  • Precision: 0.8594
  • Recall: 0.7432
  • Key Features: Sex (0.361), log_fare (0.213), log_age (0.178).

Learnings

  • Importance of handling imbalanced data with class_weight.
  • Value of feature engineering (e.g., log transformations).
  • Effectiveness of RandomForest for non-linear relationships.

About

Titanic survival prediction dataset using machine learning techniques, focusing on data preprocessing, visualization, and model selection to build accurate predictive models.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published