This project is part of a Machine Learning internship task (Level 2 - Task 3). The goal is to predict the forest cover type based on cartographic and environmental data using supervised multi-class classification.
- Source: UCI Covertype Dataset
- Records: 581,012
- Features: 54 (10 numerical, 44 binary)
- Target: Cover_Type (7 forest cover types)
- Clean and preprocess the dataset
- Train and evaluate:
- Random Forest
- XGBoost (with feature importance)
- Tune hyperparameters (Bonus ✅)
- Visualize confusion matrices and feature importances
- Python 3.x
- pandas, numpy
- matplotlib, seaborn
- scikit-learn
- xgboost
- Evaluated using classification report and confusion matrix
- Hyperparameter tuning done using
GridSearchCV
- ✅ Compared Random Forest vs. XGBoost
- ✅ Hyperparameter tuning with
GridSearchCV - ✅ Visualized top 15 important features from XGBoost
- Download the dataset from UCI and extract
covtype.data.gz - Place it in your working directory
- Install dependencies