This repository contains my solution to the Titanic Kaggle competition, where the goal is to predict the survival of passengers on the Titanic. My best result achieved a score of 0.78468 using a Logistic Regression model.
- Introduction
- Data
- Approach
- Preprocessing
- Models Used
- Best Model
- Results
- Installation
- Usage
- Conclusion
The Titanic competition is a popular beginner challenge on Kaggle, where participants build models to predict whether a passenger survived the Titanic disaster based on features like age, sex, and ticket class.
- Training Set: 891 examples with 11 features.
- Test Set: 418 examples for prediction.
I explored various models and preprocessing techniques to improve prediction accuracy. The focus was on feature engineering, model tuning, and ensemble methods.
- Imputation: Missing values were handled using the
SimpleImputer
from scikit-learn. - Encoding: Categorical features were transformed using
OneHotEncoder
. - Scaling:
StandardScaler
was applied to numerical features. - Feature Selection: The most relevant features were selected for modeling.
- Logistic Regression
- Random Forest
- Support Vector Classifier (SVC)
- K-Nearest Neighbors (KNN)
- XGBoost
- Voting Classifier: An ensemble of multiple models.
The best-performing model was Logistic Regression, which achieved a Kaggle competition score of 0.78468.
The Logistic Regression model outperformed other models with minimal tuning, demonstrating its effectiveness for this task.
To run this project, you need to have Python installed along with the following libraries:
pip install pandas numpy scikit-learn xgboost tensorflow
- Clone this repository.
- Run the Jupyter notebook
main.ipynb
. - Follow the instructions in the notebook to preprocess the data, train the models, and generate predictions.
This project provided an insightful experience into data preprocessing, feature engineering, and model selection. The Logistic Regression model's strong performance highlights its potential for similar binary classification problems.