This project aims to classify papaya fruits as healthy or diseased through binary classification. The primary objective is to assist farmers in identifying and managing diseased fruits to promote better crop health and yield.
Initially the dataset comprises 500 images of healthy papaya fruits and 500 images of diseased papaya fruits. Each image was resized to 100x100 pixels. However, due to the limited dataset, data augmentation techniques were applied to diversify the dataset. This augmentation involved flipping (3 times) and rotation (3 times) for each original image, resulting in a total of 3500 images(500 originals + 3 flips + 3 rotations for each original) for each category.
➢Collected a dataset consisting of 500 images for each healthy and diseased papaya fruits.
➢Augmented the dataset by applying flip and rotation operations, resulting in a total of 3500 images for each category.
➢Employed eight segmentation techniques to preprocess the images for feature extraction and analysis:
- Global Thresholding
- Otsu's Thresholding
- Adaptive Mean Thresholding
- Adaptive Gaussian Thresholding
- Canny Edge Detection
- Sobel Edge Detection
- K-means Clustering
- Fuzzy C-means Clustering
➢Utilized seven diverse classifiers to train the model and assess performance. The classifiers used were:
- Decision Tree
- Naive Bayes
- Logistic Regression
- K-Nearest Neighbors (KNN)
- Ensemble Classifier (Hard Voting)
- Ensemble Classifier (Soft Voting)
- Random Forest
➢Model evaluation employed k-fold cross-validation, analyzing each segmentation technique paired with every classifier to measure accuracy and identify optimal disease detection strategies.
➢The model's robustness & performance was verified by testing it with Diseased and Healthy images after applying each classifier within a segmentation techniques.
➢Based on the evaluation results identified Fuzzy C-means segmentation combined with Random Forest as the most accurate model.
➢Hosted the Fuzzy with RF model on the web using Streamlit for easy access and to interact the users with the model.
➢Explore the deployed model interface here: https://jhajibhaskar0.streamlit.app/
Accuracy table showing the performance of each classifier with each segmentation technique, with a special focus on the highest performing combination(Fuzzy C-means with RF).