"This project analyzes the Titanic survival prediction dataset using machine learning techniques, focusing on data preprocessing, visualization, and model selection."
- Source: Kaggle (Titanic-Dataset)
- Features: Passenger details (e.g., Sex, Age, Fare, Pclass)
- Target: Survived (0 = No, 1 = Yes)
- Load the Data: Imported Titanic dataset from Kaggle.
- Explore & Visualize: Analyzed distributions and relationships (e.g., Sex vs. Survival).
- Prepare Data: Handled missing values, encoded categorical variables, and scaled features.
- Model Selection: Compared SGDClassifier, LogisticRegression, and RandomForestClassifier.
- Conclusion: Selected tuned RandomForestClassifier (F1 = 0.7971).
- Best Model: RandomForestClassifier with
class_weight='balanced'
,max_depth=10
,min_samples_split=5
,n_estimators=300
. - Test F1-Score: 0.7971
- Precision: 0.8594
- Recall: 0.7432
- Key Features: Sex (0.361), log_fare (0.213), log_age (0.178).
- Importance of handling imbalanced data with
class_weight
. - Value of feature engineering (e.g., log transformations).
- Effectiveness of RandomForest for non-linear relationships.