- Overview
- Dataset Description
- Project Structure
- Installation & Setup
- Augmentation and Preprocessing
- Model and Training Details
- Performance Metrics Across Epochs
- Training Metrics Visualization
- Inference
- Future Work
- Summary
This project focuses on classification of surface defects in steel manufacturing images as part of a semantic segmentation pipeline. Classification here acts as a preliminary screening mechanism to filter out images with no defects (label 0
), thereby reducing computation required during segmentation inference.
We train a ResNet18 model to classify images into 5 classes:
0
– No Defect1 to 4
– Corresponding to defect classes in the segmentation dataset
The dataset is sourced from a steel surface defect detection competition:
Link: Severstal: Steel Defect Detection on Kaggle
train_images/
– Folder containing training imagestest_images/
– Folder containing inference imagestrain.csv
– ImageId and defect class mapping
- Images not present in
train.csv
are considered non-defective and assigned label0
- Multi-class classification is reduced to single-label by selecting the highest defect class per image
project/
│
├── train.py # Training entry point
├── inference.py # Placeholder for inference logic
├── data.py # Dataset loading & augmentation
├── evaluation.py # Evaluation functions and metrics
├── models/
│ └── resnet.py # ResNet18 architecture wrapper
├── utils/
│ └── helpers.py # Logging, visualization utilities
├── outputs/
│ ├── training\_metrics.png
│ ├── conf\_matrix.png
│ └── roc\_curve.png
├── report.md # This report
└── requirements.txt # All dependencies
- Python 3.8+
- PyTorch >= 1.10
- CUDA GPU recommended for faster training
git clone https://github.com/iampratyusht/l0-iampratyusht.git
cd l0-iampratyusht
pip install -r requirements.txt
Download the dataset from Kaggle and place folders as:
project/
├── train_images/
├── test_images/
└── train.csv
Note: You do not need to explicitly create DataLoaders when using
train.py
— the script handles this internally. This section is only for debugging, testing augmentations, or exploring dataset behavior.
from data import get_dataloaders
train_loader, val_loader = get_dataloaders(
data_dir="./train_images",
label_file="./train.csv",
batch_size=16,
img_size=(224, 1568),
num_workers=4
)
This function will:
- Prepare image-level labels (including class
0
for defect-free images) - Apply augmentations (RandomCrop, Flips, Blackout, etc.)
- Return PyTorch
DataLoader
objects for training and validation sets
During training, we use random crops of size 224x1568
. For inference, full-resolution images are used.
RandomCrop
HorizontalFlip
,VerticalFlip
RandomBrightnessContrast
(Albumentations)- Defect Blackout: Known defect pixels are blacked out. If all are removed, label becomes
0
.
This simulates natural defects and improves generalization on defect-free images.
We used ResNet18 as the classifier backbone, modified for multi-label classification.
- Batch Size: 16 (accumulated to 32)
- Epochs: 10
- Loss Function:
BCEWithLogitsLoss
- Optimizer:
SGD
with momentum
python train.py \
--model resnet18 \
--epochs 10 \
--batch-size 16 \
--lr 0.01 \
--data-dir ./train_images \
--label-file train.csv \
--save-dir ./outputs
Epoch | Train Loss | Train F1 | Train mAP | Train Acc | Train AUC | Val Loss | Val F1 | Val mAP | Val Acc | Val AUC |
---|---|---|---|---|---|---|---|---|---|---|
1 | 0.2770 | 0.3183 | 0.4294 | 0.6190 | 0.8161 | 0.2313 | 0.3339 | 0.5347 | 0.7080 | 0.8973 |
2 | 0.2181 | 0.4052 | 0.5369 | 0.7184 | 0.9005 | 0.2332 | 0.4358 | 0.5695 | 0.6810 | 0.8916 |
3 | 0.1879 | 0.4983 | 0.6315 | 0.7583 | 0.9293 | 0.1787 | 0.4662 | 0.6527 | 0.7876 | 0.9383 |
4 | 0.1667 | 0.5655 | 0.6921 | 0.7843 | 0.9471 | 0.1810 | 0.5887 | 0.7390 | 0.7566 | 0.9472 |
5 | 0.1514 | 0.6492 | 0.7419 | 0.8099 | 0.9567 | 0.1369 | 0.6416 | 0.8630 | 0.8329 | 0.9691 |
6 | 0.1440 | 0.6879 | 0.7732 | 0.8147 | 0.9608 | 0.1888 | 0.6289 | 0.8420 | 0.7677 | 0.9666 |
7 | 0.1377 | 0.7124 | 0.7904 | 0.8288 | 0.9665 | 0.1555 | 0.7275 | 0.8234 | 0.8202 | 0.9630 |
8 | 0.1290 | 0.7233 | 0.7971 | 0.8330 | 0.9695 | 0.1444 | 0.6952 | 0.8238 | 0.8353 | 0.9612 |
9 | 0.1214 | 0.7630 | 0.8316 | 0.8488 | 0.9737 | 0.1826 | 0.7178 | 0.8401 | 0.7979 | 0.9614 |
10 | 0.1098 | 0.7686 | 0.8426 | 0.8608 | 0.9782 | 0.1255 | 0.7934 | 0.8975 | 0.8632 | 0.9793 |
Displays training and validation loss, F1, mAP, and AUC across epochs.
Visualizes true vs. predicted labels for each class.
Class-wise and macro-average AUC curve visualization.
To be filled after inference.py
implementation
This section will document:
- Loading the trained model checkpoint
- Preprocessing test images
- Batch-wise prediction with optional TTA (horizontal/vertical flips)
- Saving predicted probabilities
Command:
python inference.py \
--weights model.pth \
--image-dir ./test_images \
--tta hflip,vflip \
--save-path ./outputs/inference_preds.csv
If time and compute resources allow, we plan to extend this work through:
- Transformer Models for better long-range feature modeling
- Self-Supervised Learning for pretraining on unlabeled industrial images
- Model Ensembling – Combine multiple architectures
This project successfully demonstrates a robust ResNet18-based surface defect classifier trained with meaningful augmentations and defect-aware strategies. The classifier improves pipeline efficiency by filtering out defect-absent images and providing high-confidence predictions.