This project contains from-scratch implementations of both Decision Tree Classifier and Random Forest Classifier using only python, pandas, numpy with ML concepts like ensembling(bootstrapping and bagging) Trained and Tested with large datasets and compared performance with sci-kit learn.
The model is tested with
- Titanic dataset (891 x 12)
- Forest Cover Type real world large dataset (15120 x 56)
- 🧠 Decision Tree built using Gini Impurity
- 🌲 Random Forest using bagging and multiple trees
- 📊 Accuracy comparison with Scikit learn
- ⚙️ No external ML libraies (only, numpy + pandas)
- 🔣 Handles only numerical, categories can be encoded and trained
Model | Accuracy |
---|---|
Custom Decision Tree | 83.79% |
Sklearn Decision Tree | 83.24% |
Custom Random Forest | 85.47% |
Sklearn Random Forest | 85.47% |
Model | Accuracy |
---|---|
Custom Decision Tree | 66.47% |
Sklearn Decision Tree | 78.87% |
Custom Random Forest | 61.18% |
Sklearn Random Forest | 86.94% |
- 📊 Visualization of Decision Tree
Clone the repository to try and modify
git clone https://github.com/latheeshpoondla/Decision_Tree_Random_Forest_from_scratch
cd Decision_Tree_Random_Forest_from_scratch
python Custom_DTree.py