This project uses a Vision Transformer (ViT) to classify images of leather samples into one of 6 defect categories. It includes a working training pipeline with PyTorch.
Place your dataset under the data/
directory in the following structure:
data/
βββββββ Folding marks/
βββ Grain off/
βββ Growth marks/
βββ loose grains/
βββ non defective/
βββ pinhole/
Each subfolder should contain around 600 images of that defect type.
-
Clone the repository:
git clone https://github.com/chiraggarg03/leather-defect-detection cd leather-defect-detection
-
Create a virtual environment:
python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install the required packages:
pip install -r requirements.txt
use
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
for cuda acceleration -
Run the Jupyter Notebook:
jupyter notebook
-
Open
notebooks/baseline_vit.ipynb
in your browser and run all cells to train the model.
- Model: Vision Transformer (
vit_b_16
) - Optimizer: Adam
- Loss: CrossEntropyLoss
- Accuracy: ~73% validation after 10 epochs
- The
.pth
model weights are not committed to the repo due to size limits. - If you wish to save model checkpoints, modify the notebook to save using:
torch.save(model.state_dict(), "baseline.pth")
Dataset at https://www.kaggle.com/datasets/praveen2084/leather-defect-classification/