This project compares two self-supervised learning models—ResNet50 and ViT-S/8, both pretrained using DINOv2—on a multi-label land use classification task. The dataset consists of images labeled with multiple land use tags, and performance is evaluated using metrics like accuracy, precision, recall, and F1 score.
dino_resnet50_vs_vits8.ipynb
: Main Jupyter notebook where the full pipeline (data loading, preprocessing, training, evaluation) is implemented.LandUse_Multilabeled.txt
: Tab-separated label file with image names and multi-label annotations.Images/
: Directory containing the land use images (assumed structure based on typical use).
git clone https://github.com/HingedGuide/DeepLearningMGI12
cd DeepLearningMGI12
# Install dependencies
pip install -r requirements.txt
Images can be downloaded from the UCM Data Repository.
- Download the dataset archive from the link above.
- Unzip the contents into an
Images/
folder in the root directory of this project.
When running the resnet50_vs_vits8.ipynb file in Google Colab, the images can be downloaded and placed in the correct folder by running the second code block
-
Ensure your dataset is correctly placed:
LandUse_Multilabeled.txt
Images/
folder in the expected structure.
-
Launch the notebook:
jupyter notebook dino_resnet50_vs_vits8.ipynb
-
Follow the cells step-by-step to run training and evaluation.
The notebook evaluates models using:
- Accuracy
- Precision
- Recall
- F1 Score
- Hammming loss
Plots are generated to visualize performance and label-wise statistics.
- DINOv2 ResNet50
- DINOv2 ViT-S/8
Both models use feature extraction from the DINOv2-pretrained weights followed by custom classification heads.
- You can customize transforms, model layers, and hyperparameters in the notebook.
- GPU is recommended for training.
MIT License — see LICENSE
for details.