Skip to content

A machine learning project using the Titanic dataset to predict passenger survival. Models used: XGBoost, Logistic Regression, and Random Forest.

Notifications You must be signed in to change notification settings

MahdiKh03/Will-you-survive-titanic

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🎯 Titanic Survival Prediction - Kaggle Competition

This project is a submission for the famous Titanic - Machine Learning from Disaster competition on Kaggle.
The goal is to build a classification model that predicts which passengers survived the Titanic shipwreck.

πŸ“ Dataset Description

The data consists of information such as:

  • Pclass (Ticket class)
  • Sex
  • Age
  • SibSp (Number of siblings/spouses aboard)
  • Parch (Number of parents/children aboard)
  • Fare
  • Embarked (Port of embarkation)

The data files used:

  • train.csv – Training set
  • test.csv – Test set
  • gender_submission.csv – Sample submission for test labels

🧹 Data Preprocessing

  • Missing values in Age were filled using the mean of the column.
  • Missing values in Embarked were filled using the mode.
  • The test set was also filled with the mean of training features for simplicity.
  • Categorical features Sex and Embarked were encoded using LabelEncoder.

🧠 Models Used

  • βœ… XGBoostClassifier – Main model
  • (Other models like RandomForestClassifier and LogisticRegression can be tried optionally.)

πŸ§ͺ Evaluation

Metrics used for evaluation:

  • Accuracy: 0.82
  • F1-Score: 0.73

Evaluation is based on the labels provided in gender_submission.csv.

πŸ”§ Hyperparameter Tuning

Used GridSearchCV for hyperparameter optimization with the following grid:

param_grid = {
    'max_depth': [3, 5, 7],
    'learning_rate': [0.01, 0.1, 0.2],
    'n_estimators': [100, 200, 300],
    'subsample': [0.8, 1.0],
    'colsample_bytree': [0.8, 1.0]
}

Tuning was done using f1 score with 5-fold cross-validation.

πŸ’‘ How to Run

Make sure you have the required libraries:

pip install pandas matplotlib scikit-learn xgboost

Then just run the notebook in a Jupyter environment or VSCode, and you're good to go!


πŸ“Œ Notes

  • This project is for educational purposes only and is part of a Kaggle competition.
  • The dataset is publicly available from Kaggle here.
  • You can improve the results by performing deeper feature engineering and using ensemble models.

About

A machine learning project using the Titanic dataset to predict passenger survival. Models used: XGBoost, Logistic Regression, and Random Forest.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published