Skip to content

stlin256/VLM4Classification

Repository files navigation

VLM for Image Classification

—— Finetune SmolVLM-Instruct for Image classification in specific fields

中文介绍



Introduction

This code repository provides the full process code from dataset creation, model fine-tuning, and evaluation workflows. While specifically designed for SmolVLM-256M-Instruct, the framework supports other Vision Language Models (VLMs) and is tailored for the Solar panel clean and faulty images dataset on Kaggle, with potential for domain adaptation.

Potential of Small-Parameter VLMs in Specific Image Domains

Fine-tuning improves classification accuracy from 0.2 to over 0.98+ on this dataset.

Accuracy_Improvement

Low-Latency Deployment with Consumer-Grade GPUs

Small-parameter VLMs demonstrate exceptional efficiency:

  • Minimum VRAM consumption: 4.5GB during fine-tuning (with batch_size=1)
  • Scalable via per_device_train_batch_size and per_device_eval_batch_size adjustments in fine_tune.py

The models show remarkable deployment efficiency:

  • SmolVLM-256M-Instruct requires only 1.2GB GPU memory at bfloat16 precision
  • Quantized to INT4 format: just 0.6GB VRAM

Repository Structure

 1. download_dataset.py

   Downloads the Solar panel clean and faulty images dataset from Kaggle directly to your project root directory.

 2. generate_train_json.py

   Generates training JSON files from local data:

    1. Scans subfolders in dataset root as class labels

    2. Iterates through image files

    3. Combines question templates with class descriptions to generate Q&A pairs, including direct classification prompts

    4. Splits data (default: 80% train/20% test)

    5. Saves JSON files in script directory

 3. fine_tune.py

   Main fine-tuning script that produces trained models and training logs.

Key Parameters:

# Path configurations
local_model_path = "./SmolVLM-256M-Instruct"  # Local model directory
train_json_path = "solar_panel_train_dataset.json"  # Training JSON
test_json_path = "solar_panel_test_dataset.json"  # Test JSON
output_dir = "./SmolVLM-256M-Instruct-finetuned"  # Output directory

# Training configuration
equivalent_epochs_to_train = 1
per_device_train_batch_size = 4
per_device_eval_batch_size = 4
gradient_accumulation_steps = 4
warmup_steps = 50
learning_rate = 3e-4
weight_decay = 0.01
logging_steps = 25

 4. vlm_benchmark_test_dataset.py

   Evaluates VLM performance on the test dataset, reporting final accuracy metrics.

 5. vlm_test.py

   Full dataset evaluation script. Defaults to using ./SmolVLM-256M-Instruct - modify model_name variable to evaluate fine-tuned models.

Directory Structure:

├── download_dataset.py
├── generate_train_json.py
├── fine_tune.py
├── vlm_benchmark_test_dataset.py
├── vlm_test.py
├── solar_panel_test_dataset.json
├── solar_panel_train_dataset.json
├── README.md
├── requirements.txt
├── SmolVLM-256M-Instruct/
│   └── model_files
├── SmolVLM-256M-Instruct-finetuned/
│   └── finetuned_model_files
└── Faulty_solar_panel/  # Dataset root
    ├── Category/
    └── Images.jpg

Usage Guide

Environment Setup

  1. Create virtual environment:
conda create -n vlm python=3.10
  1. Activate environment:
conda activate vlm

Project Installation

  1. Clone repository:
git clone https://github.com/stlin256/VLM4Classification.git
  1. Open project directory:
cd VLM4Classification
  1. Install dependencies:
python -m pip install --upgrade pip

Download Pytorch from pytoch.org, example:

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128

Install other dependencies

pip install -r requirements.txt

Data Preparation

  1. Download SmolVLM-256M-Instruct to ./SmolVLM-256M-Instruct
  2. Download dataset:
python download_dataset.py

Dataset Processing

python generate_train_json.py

Training

Verify batch size settings (per_device_train_batch_size, per_device_eval_batch_size) before execution

# ~4.5GB VRAM
per_device_train_batch_size = 1
per_device_eval_batch_size = 1

# ~14GB VRAM
per_device_train_batch_size = 4
per_device_eval_batch_size = 4

Run training:

python fine_tune.py

Evaluation

  1. Base model full evaluation:
python vlm_test.py
  1. Fine-tuned model test set evaluation:
python vlm_benchmark_test_dataset.py
  1. Full evaluation of fine-tuned model:
# Modify model_name variable first
python vlm_test.py

Training Results Example

The following parameters were used for training:

#generate_train_json.py

    train_set_ratio = 0.8

#finetune.py

 #SFTConfig
    equivalent_epochs_to_train = 1  
    per_device_train_batch_size = 4
    per_device_eval_batch_size = 4
    gradient_accumulation_steps = 4
    warmup_steps = 50
    learning_rate = 3e-4
    weight_decay = 0.01

Full evaluation before training:

# vlm_test.py
--- Category Accuracy Results ---
type:Snow-Covered,correct__rate:0.8373983739837398
type:Electrical-damage,correct__rate:0.34951456310679613
type:Bird-drop,correct__rate:0.02617801047120419
type:Clean,correct__rate:0.09326424870466321
type:Dusty,correct__rate:0.08994708994708994
type:Physical-Damage,correct__rate:0.028985507246376812
--- Overall Accuracy ---
Total Correct: 181/868, Accuracy: 0.2085

Full evaluation after training:

# vlm_test.py 
--- Category Accuracy Results ---
type:Snow-Covered,correct__rate:1.0
type:Electrical-damage,correct__rate:0.9320388349514563
type:Bird-drop,correct__rate:0.9895287958115183
type:Clean,correct__rate:0.9948186528497409
type:Dusty,correct__rate:0.9788359788359788
type:Physical-Damage,correct__rate:0.9855072463768116
--- Overall Accuracy ---
Total Correct: 853/868, Accuracy: 0.9827

Evaluation on test dataset after training:

# vlm_benchmark_test_dataset.py

--- Category Accuracy Results ---
type:Electrical-damage,correct_rate:0.8571 (18/21)
type:Snow-Covered,correct_rate:1.0000 (27/27)
type:Dusty,correct_rate:0.9574 (45/47)
type:Bird-drop,correct_rate:1.0000 (46/46)
type:Clean,correct_rate:1.0000 (31/31)
type:Physical-Damage,correct_rate:1.0000 (15/15)

--- Overall Accuracy ---
Total Correct: 182/187, Accuracy: 0.9733


VLM for Image Classification

—— 微调SomlVLM模型用于图像分类问题



介绍

  这个代码仓库提供了从创建数据集、微调、评估的全流程代码。适用于SmolVLM-256M-Instruct以及其它VLM,适用于Kaggle上的Solar panel clean and faulty images数据集,并可迁移使用。

展现了小参数量VLM在特定图像领域分类的潜力:

  通过微调,模型在这个数据集上的分类精确度从 0.2 提升到了大于 0.98

Accuracy_Improvement

小参数量VLM延迟低,性能要求低,在消费级GPU上轻松微调:

  小参数量VLM微调时对显存大小要求低,本仓库微调脚本最低仅消耗4.5G显存。(当batch_size被设置为1的时候)

  你可以通过调整fine_tune.py中的per_device_train_batch_sizeper_device_eval_batch_size来优化显存占用。

小参数量VLM使用时对算力和显存大小要求低,对端侧部署十分友好,SmolVLM-256M-Instruct在以bfloat16精度下推理时只需1.2G显存,使用INT4量化时只需0.6G显存。


仓库文件介绍:

 1. download_dataset.py

   用于从Kaggle下载Solar panel clean and faulty images数据集。

    直接运行即可,数据集文件夹会被下载到项目文件夹根目录

 2. generate_train_json.py

   用于从本地文件创建用于训练的json文件,工作原理如下:

    1.从数据集根文件夹开始,搜寻子文件夹,并以子文件夹名为类名

    2.遍历子文件夹中的文件。

    3.将问题列表与各类别细节描述进行组合,生成QA对,同时固定生成一个仅包含类别名称回答的QA对。

    4.打乱顺序,分割训练集和测试集(默认80%与20%)

    5.将json文件保存到脚本同级目录中。

 3. fine_tune.py

   用于进行模型微调

    此脚本将会执行模型的微调工作,并给出微调好的模型文件和记录到的日志。

    参数说明:

  #路径信息
    local_model_path = "./SmolVLM-256M-Instruct" #基座模型目录(为了加载方便,脚本从本地目录加载)
    train_json_path = "solar_panel_train_dataset.json" #训练集json
    test_json_path = "solar_panel_test_dataset.json" #测试集json
    output_dir = "./SmolVLM-256M-Instruct-finetuned" #微调输出目录
  #SFTConfig(主要部分)
    equivalent_epochs_to_train = 1  #等效epoch数
    per_device_train_batch_size = 4 #训练批量大小
    per_device_eval_batch_size = 4 #评估批量大小
    gradient_accumulation_steps = 4 #梯度累加
    warmup_steps = 50 #热身步数
    learning_rate = 3e-4 #初始学习率
    weight_decay = 0.01 #权重衰减
    logging_steps = 25 #日志记录间隔

 4. vlm_benchmark_test_dataset.py

   此脚本用于从测试集中评估vlm模型的性能,最终给出模型的正确率。

 5. vlm_test.py

   此脚本用于使用整个数据集对模型进行评估,最终给出模型的正确率。

   此脚本默认使用的是./SmolVLM-256M-Instruct中的模型,如果你需要对微调好的模型进行评估,请你修改model_name变量。

完整的文件树:

├── download_dataset.py
├── generate_train_json.py
├── fine_tune.py
├── vlm_benchmark_test_dataset.py
├── vlm_test.py
├── solar_panel_test_dataset.json
├── solar_panel_train_dataset.json
├── README.md
├── requirements.txt
├── SmolVLM-256M-Instruct
│   └── model_files
├── SmolVLM-256M-Instruct-finetuned
│   └── finetuned model_files
└── Faulty_solar_panel  #dataset root dir
     ├── Category
     └─── Images.jpg

使用方式:

配置虚拟环境

1.创建虚拟环境

conda create -n vlm python=3.10

2.激活虚拟环境

conda activate vlm

2.克隆本项目并安装依赖

1.从Github克隆本项目

git clone https://github.com/stlin256/VLM4Classification.git

2.打开项目文件夹

cd VLM4Classification

3.安装依赖

建议先升级pip

python -m pip install --upgrade pip

Pytorch官网复制适合你的环境的安装命令并运行,如:

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128

安装其它依赖

pip install -r requirements.txt

3.下载所需的文件

1.从SmolVLM-256M-Instruct下载模型文件,保存到./SmolVLM-256M-Instruct中。

2.下载数据集

运行脚本

python download_dataset.py

如果报404错误,则请检查互联网连接或者代理设置。

4.创建训练所用的json文件

运行脚本

python generate_train_json.py

5.进行训练

在运行前请检查批量大小设置,即per_device_train_batch_sizeper_device_eval_batch_size变量。

#占用约4.5G显存
per_device_train_batch_size = 1
per_device_eval_batch_size = 1
#占用约14G显存
per_device_train_batch_size = 4
per_device_eval_batch_size = 4

运行脚本

python fine_tune.py

在运行过程中,脚本会输出训练进度、剩余时间、损失、梯度、学习率等信息

各变量的作用请参照前文

6.评估(基于准确率)

1)对原始模型进行完整评估

python vlm_test.py

2)使用测试集对微调模型进行评估

python vlm_benchmark_test_dataset.py

3)对微调模型进行完整评估

model_name变量的值修改为"./SmolVLM-256M-Instruct-finetuned"

python vlm_test.py

训练结果示例

使用以下参数进行训练:

#generate_train_json.py

    train_set_ratio = 0.8

#finetune.py

 #SFTConfig
    equivalent_epochs_to_train = 1  
    per_device_train_batch_size = 4
    per_device_eval_batch_size = 4
    gradient_accumulation_steps = 4
    warmup_steps = 50
    learning_rate = 3e-4
    weight_decay = 0.01

训练前全量评估

#vlm_test.py
--- Category Accuracy Results ---
type:Snow-Covered,correct__rate:0.8373983739837398
type:Electrical-damage,correct__rate:0.34951456310679613
type:Bird-drop,correct__rate:0.02617801047120419
type:Clean,correct__rate:0.09326424870466321
type:Dusty,correct__rate:0.08994708994708994
type:Physical-Damage,correct__rate:0.028985507246376812
--- Overall Accuracy ---
Total Correct: 181/868, Accuracy: 0.2085

训练后全量评估

#vlm_test.py
--- Category Accuracy Results ---
type:Snow-Covered,correct__rate:1.0
type:Electrical-damage,correct__rate:0.9320388349514563
type:Bird-drop,correct__rate:0.9895287958115183
type:Clean,correct__rate:0.9948186528497409
type:Dusty,correct__rate:0.9788359788359788
type:Physical-Damage,correct__rate:0.9855072463768116
--- Overall Accuracy ---
Total Correct: 853/868, Accuracy: 0.9827

训练后在测试集上的评估

#vlm_benchmark_test_dataset.py

--- Category Accuracy Results ---
type:Electrical-damage,correct_rate:0.8571 (18/21)
type:Snow-Covered,correct_rate:1.0000 (27/27)
type:Dusty,correct_rate:0.9574 (45/47)
type:Bird-drop,correct_rate:1.0000 (46/46)
type:Clean,correct_rate:1.0000 (31/31)
type:Physical-Damage,correct_rate:1.0000 (15/15)

--- Overall Accuracy ---
Total Correct: 182/187, Accuracy: 0.9733

About

Finetune VLM for Image classification in specific fields

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages