Titanic - Machine Learning from Disaster

This repository contains my solution to the Titanic Kaggle competition, where the goal is to predict the survival of passengers on the Titanic. My best result achieved a score of 0.78468 using a Logistic Regression model.

Introduction

The Titanic competition is a popular beginner challenge on Kaggle, where participants build models to predict whether a passenger survived the Titanic disaster based on features like age, sex, and ticket class.

Data

Training Set: 891 examples with 11 features.
Test Set: 418 examples for prediction.

Approach

I explored various models and preprocessing techniques to improve prediction accuracy. The focus was on feature engineering, model tuning, and ensemble methods.

Preprocessing

Imputation: Missing values were handled using the SimpleImputer from scikit-learn.
Encoding: Categorical features were transformed using OneHotEncoder.
Scaling: StandardScaler was applied to numerical features.
Feature Selection: The most relevant features were selected for modeling.

Models Used

Logistic Regression
Random Forest
Support Vector Classifier (SVC)
K-Nearest Neighbors (KNN)
XGBoost
Voting Classifier: An ensemble of multiple models.

Best Model

The best-performing model was Logistic Regression, which achieved a Kaggle competition score of 0.78468.

Results

The Logistic Regression model outperformed other models with minimal tuning, demonstrating its effectiveness for this task.

Installation

To run this project, you need to have Python installed along with the following libraries:

pip install pandas numpy scikit-learn xgboost tensorflow

Usage

Clone this repository.
Run the Jupyter notebook main.ipynb.
Follow the instructions in the notebook to preprocess the data, train the models, and generate predictions.

Conclusion

This project provided an insightful experience into data preprocessing, feature engineering, and model selection. The Logistic Regression model's strong performance highlights its potential for similar binary classification problems.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.idea		.idea
data		data
old		old
output		output
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
main.ipynb		main.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Titanic - Machine Learning from Disaster

Table of Contents

Introduction

Data

Approach

Preprocessing

Models Used

Best Model

Results

Installation

Usage

Conclusion

About

Uh oh!

Releases

Packages

Languages

Geronimo-Basso/titanic-disaster

Folders and files

Latest commit

History

Repository files navigation

Titanic - Machine Learning from Disaster

Table of Contents

Introduction

Data

Approach

Preprocessing

Models Used

Best Model

Results

Installation

Usage

Conclusion

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages