ViT-KNN: Semi-Supervised Pseudo-Labeling with Vision Transformers and KNN

This repository contains the codebase developed by the CUDA_Libre team for the Neural Wave Hackathon 2024, where our solution earned 1^st place. The project automates the verification of steel bar alignment in a rolling mill using state-of-the-art Computer Vision models, combining semi-supervised Vision Transformers (ViT) and KNN-based pseudo-labeling. By enhancing operational efficiency and reducing human error, this system offers a scalable solution to modernize steel bar manufacturing processes.

Problem Context

Fig. 1 depicts a sequence of steel bars moving towards a stopper on a rolling table. The goal is to assess whether the bars are properly aligned. Currently, this alignment check is performed manually by human operators who rely solely on visual inspection of real-time images. Determining alignment can be challenging due to uncertainties caused by various factors, including perspective distortions, vibrations, shadows, and inconsistent lighting conditions.
Manual inspection of steel bar alignment is a labor intensive task that can lead to errors due to operator fatigue. Our solution automates this verification, allowing plant operators to focus on more critical aspects of the production process. The workflow of our approach can be divided into two key stages:

Semi-Supervised Labeling Workflow
Model Training and Inference

Fig. 1 Sample images showing a sequence of aligned and not aligned bars on a rolling table approaching the stopper.

Data Labeling Pipeline

Here is a diagram illustrating the data labeling workflow, which integrates human labeling and pseudo-labeling by leveraging DINOv2 model embeddings and KNN label assignment through similarity search.

Methodology

1. DINOv2 KNN-based Pseudo-Labeling Workflow

Given the large, mostly unlabeled dataset of 15,630 images, we used an efficient approach to label the dataset, which combines human-labeling and pseudo-labeling.

Human Labeling: We labeled manually an initial subset of 5,000 images, creating a foundation for reliable training and test data.
DINOv2 for Embeddings: We used DINOv2, a self-supervised vision transformer model, to generate high-dimensional embeddings of the images. These embeddings capture complex semantic features, without requiring any fine-tuning, that make it possible to measure image similarity effectively.
K-Nearest Neighbors (KNN) with FAISS: We used FAISS for fast, scalable similarity searches within the embedding space. For each unlabeled image, we identified its K-nearest neighbors and assigned a label based on a majority vote of their known labels, taken from the manually labeled dataset.
Cosine Similarity: To ensure robust label assignment, we employ cosine similarity to compare image features and calculating "distances" in the KNN embedding space, with the following similarity function $m$:

$$m(s, r) = \text{cosine-similarity} (f(s), f(r)) = \frac{f(s) \cdot f(r)}{\|f(s)\|_2 \|f(r)\|_2}$$

where $s$ and $r$ are a pair of images to compare and $f$ is the model generating features. This method enabled us to expand the labeled dataset efficiently without manual effort for each image. To run the Pseudo-Labeling check out the documentation: DINOv2 KNN-based Pseudo-Labeling.

2. Model Training, Inference and Results

The expanded dataset was used to train an EfficientNet-B0 model, chosen for its balance of accuracy and computational efficiency. We trained EfficientNet-B0 starting from the original weigths, adapting the classification layer for binary classification measuring of the alignment status. EfficientNet-B0 was also compared against the MobileNetV2 model.

Training Details: The model was trained for 30 epochs, with the peak validation performance observed at epoch 10. Key performance metrics included:
- Accuracy: 93.40%
- Precision: 94.37%
- Recall: 95.82%
- F1 Score: 95.09%
The model demonstrated reliable classification capabilities with a mean inference time on the test set of 0.0298 seconds per image, meeting the real-time inference requirement of under 0.5 seconds per image.

Inference Time Statistic	Time (seconds)
Mean Time	0.0298
25th Percentile	0.0111
Median (50th Percentile)	0.0117
75th Percentile	0.0128

Run the Code

Installation

Install the required packages with:

pip install -r requirements.txt

Training

To perform the Pseudo-Labeling check out the documentation: DINOv2 KNN-based Pseudo-Labeling. To train the EfficentNet-B0 model, run the training script:

python train.py \
    --data_config_path "dataset/augmented_split.json" \
    --batch_size 32 \
    --num_epochs 30 \
    --learning_rate 0.0001 \
    --checkpoint_path "checkpoints/efficient_net"

Testing

Evaluate the model performance on the test set using:

python test.py \
    --data_config_path "dataset/split.json" \
    --batch_size 16 \
    --model_path "checkpoints/efficient_net/20241027_083453/model_epoch_10.pt"

License

This project is licensed under the MIT License - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ViT-KNN: Semi-Supervised Pseudo-Labeling with Vision Transformers and KNN

Problem Context

Data Labeling Pipeline

Methodology

1. DINOv2 KNN-based Pseudo-Labeling Workflow

2. Model Training, Inference and Results

Run the Code

Installation

Training

Testing

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
assets		assets
data		data
dataset		dataset
dino		dino
labeling		labeling
models		models
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
report.pdf		report.pdf
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py

dansimaa/ViT-KNN

Folders and files

Latest commit

History

Repository files navigation

ViT-KNN: Semi-Supervised Pseudo-Labeling with Vision Transformers and KNN

Problem Context

Data Labeling Pipeline

Methodology

1. DINOv2 KNN-based Pseudo-Labeling Workflow

2. Model Training, Inference and Results

Run the Code

Installation

Training

Testing

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages