Skip to content

shahzadmustafa15/forest-cover-classification

Repository files navigation

🌲 Forest Cover Type Classification (UCI Covertype Dataset)

This project is part of a Machine Learning internship task (Level 2 - Task 3). The goal is to predict the forest cover type based on cartographic and environmental data using supervised multi-class classification.

📂 Dataset

  • Source: UCI Covertype Dataset
  • Records: 581,012
  • Features: 54 (10 numerical, 44 binary)
  • Target: Cover_Type (7 forest cover types)

📌 Objectives

  • Clean and preprocess the dataset
  • Train and evaluate:
    • Random Forest
    • XGBoost (with feature importance)
  • Tune hyperparameters (Bonus ✅)
  • Visualize confusion matrices and feature importances

🛠️ Tools & Libraries

  • Python 3.x
  • pandas, numpy
  • matplotlib, seaborn
  • scikit-learn
  • xgboost

📊 Model Performance

  • Evaluated using classification report and confusion matrix
  • Hyperparameter tuning done using GridSearchCV

📈 Bonus Tasks Completed

  • ✅ Compared Random Forest vs. XGBoost
  • ✅ Hyperparameter tuning with GridSearchCV
  • ✅ Visualized top 15 important features from XGBoost

🚀 How to Run

  1. Download the dataset from UCI and extract covtype.data.gz
  2. Place it in your working directory
  3. Install dependencies

About

ML model to classify forest cover types using environmental data (UCI Covertype Dataset).

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published