This code repository provides the full process code from dataset creation, model fine-tuning, and evaluation workflows. While specifically designed for SmolVLM-256M-Instruct, the framework supports other Vision Language Models (VLMs) and is tailored for the Solar panel clean and faulty images dataset on Kaggle, with potential for domain adaptation.
Fine-tuning improves classification accuracy from 0.2 to over 0.98+ on this dataset.
Small-parameter VLMs demonstrate exceptional efficiency:
- Minimum VRAM consumption: 4.5GB during fine-tuning (with
batch_size=1
) - Scalable via
per_device_train_batch_size
andper_device_eval_batch_size
adjustments infine_tune.py
The models show remarkable deployment efficiency:
- SmolVLM-256M-Instruct requires only 1.2GB GPU memory at
bfloat16
precision - Quantized to
INT4
format: just 0.6GB VRAM
Downloads the Solar panel clean and faulty images dataset from Kaggle directly to your project root directory.
Generates training JSON files from local data:
1. Scans subfolders in dataset root as class labels
2. Iterates through image files
3. Combines question templates with class descriptions to generate Q&A pairs, including direct classification prompts
4. Splits data (default: 80% train/20% test)
5. Saves JSON files in script directory
Main fine-tuning script that produces trained models and training logs.
# Path configurations
local_model_path = "./SmolVLM-256M-Instruct" # Local model directory
train_json_path = "solar_panel_train_dataset.json" # Training JSON
test_json_path = "solar_panel_test_dataset.json" # Test JSON
output_dir = "./SmolVLM-256M-Instruct-finetuned" # Output directory
# Training configuration
equivalent_epochs_to_train = 1
per_device_train_batch_size = 4
per_device_eval_batch_size = 4
gradient_accumulation_steps = 4
warmup_steps = 50
learning_rate = 3e-4
weight_decay = 0.01
logging_steps = 25
Evaluates VLM performance on the test dataset, reporting final accuracy metrics.
Full dataset evaluation script. Defaults to using ./SmolVLM-256M-Instruct
- modify model_name
variable to evaluate fine-tuned models.
├── download_dataset.py
├── generate_train_json.py
├── fine_tune.py
├── vlm_benchmark_test_dataset.py
├── vlm_test.py
├── solar_panel_test_dataset.json
├── solar_panel_train_dataset.json
├── README.md
├── requirements.txt
├── SmolVLM-256M-Instruct/
│ └── model_files
├── SmolVLM-256M-Instruct-finetuned/
│ └── finetuned_model_files
└── Faulty_solar_panel/ # Dataset root
├── Category/
└── Images.jpg
- Create virtual environment:
conda create -n vlm python=3.10
- Activate environment:
conda activate vlm
- Clone repository:
git clone https://github.com/stlin256/VLM4Classification.git
- Open project directory:
cd VLM4Classification
- Install dependencies:
python -m pip install --upgrade pip
Download Pytorch from pytoch.org, example:
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
Install other dependencies
pip install -r requirements.txt
- Download SmolVLM-256M-Instruct to
./SmolVLM-256M-Instruct
- Download dataset:
python download_dataset.py
python generate_train_json.py
Verify batch size settings (per_device_train_batch_size
, per_device_eval_batch_size
) before execution
# ~4.5GB VRAM
per_device_train_batch_size = 1
per_device_eval_batch_size = 1
# ~14GB VRAM
per_device_train_batch_size = 4
per_device_eval_batch_size = 4
Run training:
python fine_tune.py
- Base model full evaluation:
python vlm_test.py
- Fine-tuned model test set evaluation:
python vlm_benchmark_test_dataset.py
- Full evaluation of fine-tuned model:
# Modify model_name variable first
python vlm_test.py
The following parameters were used for training:
#generate_train_json.py
train_set_ratio = 0.8
#finetune.py
#SFTConfig
equivalent_epochs_to_train = 1
per_device_train_batch_size = 4
per_device_eval_batch_size = 4
gradient_accumulation_steps = 4
warmup_steps = 50
learning_rate = 3e-4
weight_decay = 0.01
Full evaluation before training:
# vlm_test.py
--- Category Accuracy Results ---
type:Snow-Covered,correct__rate:0.8373983739837398
type:Electrical-damage,correct__rate:0.34951456310679613
type:Bird-drop,correct__rate:0.02617801047120419
type:Clean,correct__rate:0.09326424870466321
type:Dusty,correct__rate:0.08994708994708994
type:Physical-Damage,correct__rate:0.028985507246376812
--- Overall Accuracy ---
Total Correct: 181/868, Accuracy: 0.2085
Full evaluation after training:
# vlm_test.py
--- Category Accuracy Results ---
type:Snow-Covered,correct__rate:1.0
type:Electrical-damage,correct__rate:0.9320388349514563
type:Bird-drop,correct__rate:0.9895287958115183
type:Clean,correct__rate:0.9948186528497409
type:Dusty,correct__rate:0.9788359788359788
type:Physical-Damage,correct__rate:0.9855072463768116
--- Overall Accuracy ---
Total Correct: 853/868, Accuracy: 0.9827
Evaluation on test dataset after training:
# vlm_benchmark_test_dataset.py
--- Category Accuracy Results ---
type:Electrical-damage,correct_rate:0.8571 (18/21)
type:Snow-Covered,correct_rate:1.0000 (27/27)
type:Dusty,correct_rate:0.9574 (45/47)
type:Bird-drop,correct_rate:1.0000 (46/46)
type:Clean,correct_rate:1.0000 (31/31)
type:Physical-Damage,correct_rate:1.0000 (15/15)
--- Overall Accuracy ---
Total Correct: 182/187, Accuracy: 0.9733
这个代码仓库提供了从创建数据集、微调、评估的全流程代码。适用于SmolVLM-256M-Instruct以及其它VLM,适用于Kaggle上的Solar panel clean and faulty images数据集,并可迁移使用。
通过微调,模型在这个数据集上的分类精确度从 0.2 提升到了大于 0.98!
小参数量VLM微调时对显存大小要求低,本仓库微调脚本最低仅消耗4.5G显存。(当batch_size
被设置为1的时候)
你可以通过调整fine_tune.py
中的per_device_train_batch_size
和per_device_eval_batch_size
来优化显存占用。
小参数量VLM使用时对算力和显存大小要求低,对端侧部署十分友好,SmolVLM-256M-Instruct在以bfloat16
精度下推理时只需1.2G显存,使用INT4
量化时只需0.6G显存。
用于从Kaggle下载Solar panel clean and faulty images数据集。
直接运行即可,数据集文件夹会被下载到项目文件夹根目录
1.从数据集根文件夹开始,搜寻子文件夹,并以子文件夹名为类名
2.遍历子文件夹中的文件。
3.将问题列表与各类别细节描述进行组合,生成QA对,同时固定生成一个仅包含类别名称回答的QA对。
4.打乱顺序,分割训练集和测试集(默认80%与20%)
5.将json文件保存到脚本同级目录中。
此脚本将会执行模型的微调工作,并给出微调好的模型文件和记录到的日志。
#路径信息
local_model_path = "./SmolVLM-256M-Instruct" #基座模型目录(为了加载方便,脚本从本地目录加载)
train_json_path = "solar_panel_train_dataset.json" #训练集json
test_json_path = "solar_panel_test_dataset.json" #测试集json
output_dir = "./SmolVLM-256M-Instruct-finetuned" #微调输出目录
#SFTConfig(主要部分)
equivalent_epochs_to_train = 1 #等效epoch数
per_device_train_batch_size = 4 #训练批量大小
per_device_eval_batch_size = 4 #评估批量大小
gradient_accumulation_steps = 4 #梯度累加
warmup_steps = 50 #热身步数
learning_rate = 3e-4 #初始学习率
weight_decay = 0.01 #权重衰减
logging_steps = 25 #日志记录间隔
此脚本用于从测试集中评估vlm模型的性能,最终给出模型的正确率。
此脚本用于使用整个数据集对模型进行评估,最终给出模型的正确率。
此脚本默认使用的是./SmolVLM-256M-Instruct
中的模型,如果你需要对微调好的模型进行评估,请你修改model_name
变量。
├── download_dataset.py
├── generate_train_json.py
├── fine_tune.py
├── vlm_benchmark_test_dataset.py
├── vlm_test.py
├── solar_panel_test_dataset.json
├── solar_panel_train_dataset.json
├── README.md
├── requirements.txt
├── SmolVLM-256M-Instruct
│ └── model_files
├── SmolVLM-256M-Instruct-finetuned
│ └── finetuned model_files
└── Faulty_solar_panel #dataset root dir
├── Category
└─── Images.jpg
1.创建虚拟环境
conda create -n vlm python=3.10
2.激活虚拟环境
conda activate vlm
1.从Github克隆本项目
git clone https://github.com/stlin256/VLM4Classification.git
2.打开项目文件夹
cd VLM4Classification
3.安装依赖
建议先升级pip
python -m pip install --upgrade pip
从Pytorch官网复制适合你的环境的安装命令并运行,如:
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
安装其它依赖
pip install -r requirements.txt
1.从SmolVLM-256M-Instruct下载模型文件,保存到./SmolVLM-256M-Instruct
中。
2.下载数据集
运行脚本
python download_dataset.py
如果报404错误,则请检查互联网连接或者代理设置。
运行脚本
python generate_train_json.py
在运行前请检查批量大小设置,即per_device_train_batch_size
和per_device_eval_batch_size
变量。
#占用约4.5G显存
per_device_train_batch_size = 1
per_device_eval_batch_size = 1
#占用约14G显存
per_device_train_batch_size = 4
per_device_eval_batch_size = 4
运行脚本
python fine_tune.py
在运行过程中,脚本会输出训练进度、剩余时间、损失、梯度、学习率等信息
各变量的作用请参照前文
1)对原始模型进行完整评估
python vlm_test.py
2)使用测试集对微调模型进行评估
python vlm_benchmark_test_dataset.py
3)对微调模型进行完整评估
将model_name
变量的值修改为"./SmolVLM-256M-Instruct-finetuned"
python vlm_test.py
使用以下参数进行训练:
#generate_train_json.py
train_set_ratio = 0.8
#finetune.py
#SFTConfig
equivalent_epochs_to_train = 1
per_device_train_batch_size = 4
per_device_eval_batch_size = 4
gradient_accumulation_steps = 4
warmup_steps = 50
learning_rate = 3e-4
weight_decay = 0.01
训练前全量评估
#vlm_test.py
--- Category Accuracy Results ---
type:Snow-Covered,correct__rate:0.8373983739837398
type:Electrical-damage,correct__rate:0.34951456310679613
type:Bird-drop,correct__rate:0.02617801047120419
type:Clean,correct__rate:0.09326424870466321
type:Dusty,correct__rate:0.08994708994708994
type:Physical-Damage,correct__rate:0.028985507246376812
--- Overall Accuracy ---
Total Correct: 181/868, Accuracy: 0.2085
训练后全量评估
#vlm_test.py
--- Category Accuracy Results ---
type:Snow-Covered,correct__rate:1.0
type:Electrical-damage,correct__rate:0.9320388349514563
type:Bird-drop,correct__rate:0.9895287958115183
type:Clean,correct__rate:0.9948186528497409
type:Dusty,correct__rate:0.9788359788359788
type:Physical-Damage,correct__rate:0.9855072463768116
--- Overall Accuracy ---
Total Correct: 853/868, Accuracy: 0.9827
训练后在测试集上的评估
#vlm_benchmark_test_dataset.py
--- Category Accuracy Results ---
type:Electrical-damage,correct_rate:0.8571 (18/21)
type:Snow-Covered,correct_rate:1.0000 (27/27)
type:Dusty,correct_rate:0.9574 (45/47)
type:Bird-drop,correct_rate:1.0000 (46/46)
type:Clean,correct_rate:1.0000 (31/31)
type:Physical-Damage,correct_rate:1.0000 (15/15)
--- Overall Accuracy ---
Total Correct: 182/187, Accuracy: 0.9733