Skip to content

iampratyusht/Defect-detection-with-Vision-Models

Repository files navigation

Surface Defect Classifier Training Report

Table of Contents

  1. Overview
  2. Dataset Description
  3. Project Structure
  4. Installation & Setup
  5. Augmentation and Preprocessing
  6. Model and Training Details
  7. Performance Metrics Across Epochs
  8. Training Metrics Visualization
  9. Inference
  10. Future Work
  11. Summary

1. Overview

This project focuses on classification of surface defects in steel manufacturing images as part of a semantic segmentation pipeline. Classification here acts as a preliminary screening mechanism to filter out images with no defects (label 0), thereby reducing computation required during segmentation inference.

We train a ResNet18 model to classify images into 5 classes:

  • 0 – No Defect
  • 1 to 4 – Corresponding to defect classes in the segmentation dataset

2. Dataset Description

The dataset is sourced from a steel surface defect detection competition:

Link: Severstal: Steel Defect Detection on Kaggle

Contents:

  • train_images/ – Folder containing training images
  • test_images/ – Folder containing inference images
  • train.csv – ImageId and defect class mapping

Notes:

  • Images not present in train.csv are considered non-defective and assigned label 0
  • Multi-class classification is reduced to single-label by selecting the highest defect class per image

3. Project Structure


project/
│
├── train.py             # Training entry point
├── inference.py         # Placeholder for inference logic
├── data.py              # Dataset loading & augmentation
├── evaluation.py        # Evaluation functions and metrics
├── models/
│   └── resnet.py        # ResNet18 architecture wrapper
├── utils/
│   └── helpers.py       # Logging, visualization utilities
├── outputs/
│   ├── training\_metrics.png
│   ├── conf\_matrix.png
│   └── roc\_curve.png
├── report.md            # This report
└── requirements.txt     # All dependencies


4. Installation & Setup

Prerequisites

  • Python 3.8+
  • PyTorch >= 1.10
  • CUDA GPU recommended for faster training

Installation

git clone https://github.com/iampratyusht/l0-iampratyusht.git
cd l0-iampratyusht
pip install -r requirements.txt

Dataset Preparation

Download the dataset from Kaggle and place folders as:

project/
├── train_images/
├── test_images/
└── train.csv

Optional: Manually Creating DataLoaders (For Testing)

Note: You do not need to explicitly create DataLoaders when using train.py — the script handles this internally. This section is only for debugging, testing augmentations, or exploring dataset behavior.

from data import get_dataloaders

train_loader, val_loader = get_dataloaders(
    data_dir="./train_images",
    label_file="./train.csv",
    batch_size=16,
    img_size=(224, 1568),
    num_workers=4
)

This function will:

  • Prepare image-level labels (including class 0 for defect-free images)
  • Apply augmentations (RandomCrop, Flips, Blackout, etc.)
  • Return PyTorch DataLoader objects for training and validation sets

5. Augmentation and Preprocessing

During training, we use random crops of size 224x1568. For inference, full-resolution images are used.

Augmentations:

  • RandomCrop
  • HorizontalFlip, VerticalFlip
  • RandomBrightnessContrast (Albumentations)
  • Defect Blackout: Known defect pixels are blacked out. If all are removed, label becomes 0.

This simulates natural defects and improves generalization on defect-free images.


6. Model and Training Details

We used ResNet18 as the classifier backbone, modified for multi-label classification.

Training Parameters:

  • Batch Size: 16 (accumulated to 32)
  • Epochs: 10
  • Loss Function: BCEWithLogitsLoss
  • Optimizer: SGD with momentum

Training Command:

python train.py \
    --model resnet18 \
    --epochs 10 \
    --batch-size 16 \
    --lr 0.01 \
    --data-dir ./train_images \
    --label-file train.csv \
    --save-dir ./outputs

7. Performance Metrics Across Epochs

Epoch Train Loss Train F1 Train mAP Train Acc Train AUC Val Loss Val F1 Val mAP Val Acc Val AUC
1 0.2770 0.3183 0.4294 0.6190 0.8161 0.2313 0.3339 0.5347 0.7080 0.8973
2 0.2181 0.4052 0.5369 0.7184 0.9005 0.2332 0.4358 0.5695 0.6810 0.8916
3 0.1879 0.4983 0.6315 0.7583 0.9293 0.1787 0.4662 0.6527 0.7876 0.9383
4 0.1667 0.5655 0.6921 0.7843 0.9471 0.1810 0.5887 0.7390 0.7566 0.9472
5 0.1514 0.6492 0.7419 0.8099 0.9567 0.1369 0.6416 0.8630 0.8329 0.9691
6 0.1440 0.6879 0.7732 0.8147 0.9608 0.1888 0.6289 0.8420 0.7677 0.9666
7 0.1377 0.7124 0.7904 0.8288 0.9665 0.1555 0.7275 0.8234 0.8202 0.9630
8 0.1290 0.7233 0.7971 0.8330 0.9695 0.1444 0.6952 0.8238 0.8353 0.9612
9 0.1214 0.7630 0.8316 0.8488 0.9737 0.1826 0.7178 0.8401 0.7979 0.9614
10 0.1098 0.7686 0.8426 0.8608 0.9782 0.1255 0.7934 0.8975 0.8632 0.9793

8. Training Metrics Visualization

Training Curve

Training Curve

Displays training and validation loss, F1, mAP, and AUC across epochs.


Confusion Matrix

Confusion Matrix

Visualizes true vs. predicted labels for each class.


ROC-AUC Curve

ROC Curve

Class-wise and macro-average AUC curve visualization.


9. Inference

To be filled after inference.py implementation

This section will document:

  • Loading the trained model checkpoint
  • Preprocessing test images
  • Batch-wise prediction with optional TTA (horizontal/vertical flips)
  • Saving predicted probabilities

Command:

python inference.py \
    --weights model.pth \
    --image-dir ./test_images \
    --tta hflip,vflip \
    --save-path ./outputs/inference_preds.csv

10. Future Work

If time and compute resources allow, we plan to extend this work through:

  • Transformer Models for better long-range feature modeling
  • Self-Supervised Learning for pretraining on unlabeled industrial images
  • Model Ensembling – Combine multiple architectures

11. Summary

This project successfully demonstrates a robust ResNet18-based surface defect classifier trained with meaningful augmentations and defect-aware strategies. The classifier improves pipeline efficiency by filtering out defect-absent images and providing high-confidence predictions.


About

Developing a machine learning model to classify images of industrial equipment into Defective & Non-Defective

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages