This repository keeps the code required for attempting to PlantClef 2025 challenge.
Challenge's details are available here.
The core of our method is a narrow vision transformer (ViT), which we employ to perform segmentation i.e., discriminate between plant (relevant) and background elements (irrelevant). This ViT is trained to reconstruct class prototypes (clusters) from the training set (containing individual plants) from high-resolution quadrat images (test set). We adapt this model to perform segmentation, by using its attention scores to eliminate the background elements, and then perform classification over the remaining relevant patches - in a decrease and conquer fashion.
We were able to achieve challenge's 5th position with a score of 0.33331, only 0.03148 behind the first position.
Our whole project was built using Python, mainly PyTorch.
Required packages:
- PyTorch
- timm
- FAISS
- Pillow
We have splitted the training into three different scripts:
run_feature_extraction.sh
: extracts the features from the images containing a single plant samplerun_kmeans.sh
: clusters the extracted features into 7806 clusters, one for each specie in the datasetrun_zero_shot.sh
: trains a narrow ViT to predict the target feature prototype vector
The scripts mentioned above should be called in the same order they are listed to achieve the required result.
For the test step, the script test_submission.sh
is responsible for loading the test dataset and using the trained ViT to segment the image into relevant and irrelevant patches based on attention, collate neighbors by proximity and classifying these aggregated patches.
All data used for training and tests are available at challenge's page.
We would like to thanks Dr. Rodrigo Tripodi Calumby and Dr. Rodrigo Pereira David as our advisors. We also would like to thanks Lukáš Picek for support and encouragement.
This project was partially supported by UEFS and Capes.