🚀 Spaceship Titanic - ML Classification Project

This notebook is a solution for the "Spaceship Titanic" machine learning competition on Kaggle. The task is to predict whether a passenger was transported to an alternate dimension based on various characteristics.

📁 Dataset Overview

The dataset is structured like a typical tabular classification problem:

Main files:

train.csv: Includes ~8,700 passengers with labeled target (Transported)
test.csv: ~4,300 passengers without labels

Key features:

Categorical:
HomePlanet, CryoSleep, Destination, VIP, Cabin
Numerical:
Age, RoomService, FoodCourt, ShoppingMall, Spa, VRDeck
Target:
Transported (Boolean: True/False)

🧠 Project Workflow

1. Exploratory Data Analysis (EDA)

Checked class balance: the dataset is quite balanced between transported and not transported.
Visualized distribution of numerical features like Age and Spa spending.
Plotted categorical features using countplot() and pie charts to spot trends.
Observed that features like CryoSleep, Destination, and VIP had visible correlations with the target.

2. Data Cleaning & Preprocessing

Dropped:
- Name: mostly irrelevant for prediction.
- Cabin: many missing values, complex to parse initially.
Handled missing values:
- For categoricals like HomePlanet, filled with "unknown".
- For numerical spendings (RoomService, FoodCourt, etc.), filled with 0.
- For Age, used median imputation.
Converted categorical to numerical using LabelEncoder.

3. Feature Engineering

Simplified data types (e.g., from float64 to float32).
Created no new features, kept baseline model clean and interpretable.

4. Model Training & Evaluation

Trained and evaluated the following classifiers:
- Logistic Regression
- Decision Tree
- Random Forest
- Extra Trees Classifier
- LightGBM (LGBMClassifier)
- XGBoost (XGBClassifier)
Used train_test_split for validation.
Evaluated using:
- accuracy_score
- classification_report
- confusion_matrix
Compared each model’s performance. Tree-based models (especially XGBoost and LGBM) outperformed others in both accuracy and generalization.

5. Hyperparameter Tuning

Applied GridSearchCV to:
- LGBMClassifier
- RandomForestClassifier
Tuned parameters like:
- n_estimators
- max_depth
- learning_rate (for LGBM)

🧪 Results

Best model: LightGBM with tuned hyperparameters.
Achieved over 0.80 accuracy on validation set.
Saved predictions into CSV for Kaggle submission.

📦 Dependencies

pandas
numpy
matplotlib
seaborn
scikit-learn
xgboost
lightgbm

🎯 Goal

To build a solid baseline ML pipeline and test the performance of various models on a real-world-style dataset. This project is useful for practicing:

End-to-end ML workflows
Data cleaning strategies
Comparison of classic and modern classifiers
Working with structured data from competitions

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
final_model_lgbm.pkl		final_model_lgbm.pkl
spaceship_titanic.ipynb		spaceship_titanic.ipynb
submission_spaceship_titanic.csv		submission_spaceship_titanic.csv
test.csv		test.csv
train.csv		train.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🚀 Spaceship Titanic - ML Classification Project

📁 Dataset Overview

Main files:

Key features:

🧠 Project Workflow

1. Exploratory Data Analysis (EDA)

2. Data Cleaning & Preprocessing

3. Feature Engineering

4. Model Training & Evaluation

5. Hyperparameter Tuning

🧪 Results

📦 Dependencies

🎯 Goal

About

Uh oh!

Releases

Packages

Languages

MahdiKh03/titanic-in-space

Folders and files

Latest commit

History

Repository files navigation

🚀 Spaceship Titanic - ML Classification Project

📁 Dataset Overview

Main files:

Key features:

🧠 Project Workflow

1. Exploratory Data Analysis (EDA)

2. Data Cleaning & Preprocessing

3. Feature Engineering

4. Model Training & Evaluation

5. Hyperparameter Tuning

🧪 Results

📦 Dependencies

🎯 Goal

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages