-
Notifications
You must be signed in to change notification settings - Fork 148
Description
If this is your first time, please read our contributor guidelines:
https://github.com/mindspore-lab/mindcv/blob/main/CONTRIBUTING.md
Describe the bug/ 问题描述 (Mandatory / 必填)
使用vgg16、vgg19在GPU和NPU跑5分类花的数据loss不收敛、精度有问题。
- Hardware Environment(
Ascend
/GPU
/CPU
) / 硬件环境:
Please delete the backend not involved / 请删除不涉及的后端:
/device ascend/GPU
-
Software Environment / 软件环境 (Mandatory / 必填):
-- MindSpore version (e.g., 2.2.11) :
-- Python version (e.g., Python 3.9.18) :
-- OS platform and distribution (e.g., Linux Ubuntu 22.04):
-- GCC/Compiler version (if compiled from source): -
Excute Mode / 执行模式 (Mandatory / 必填)(
PyNative
/Graph
):
Please delete the mode not involved / 请删除不涉及的模式:
/mode pynative PYNATIVE_MODE(1)
/mode graph
To Reproduce / 重现步骤 (Mandatory / 必填)
Steps to reproduce the behavior:
使用yaml文件训练
命令:python train.py --config ./configs/vgg/vgg16_ascend.yaml
Expected behavior / 预期结果 (Mandatory / 必填)
A clear and concise description of what you expected to happen.
Screenshots/ 日志 / 截图 (Mandatory / 必填)
If applicable, add screenshots to help explain your problem.
yaml文件内容
system
mode: 1
distribute: False
num_parallel_workers: 8
val_while_train: True
dataset
dataset: 'imagenet'
data_dir: './imageNet'
shuffle: True
dataset_download: False
batch_size: 32
drop_remainder: True
augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
interpolation: 'bilinear'
crop_pct: 0.875
model
model: 'vgg16'
num_classes: 5
pretrained: True
ckpt_path: ''
keep_checkpoint_max: 1
ckpt_save_dir: './ckpt3'
epoch_size: 20
dataset_sink_mode: True
amp_level: 'O0'
loss
loss: 'CE'
label_smoothing: 0.1
lr scheduler
scheduler: 'warmup_cosine_decay'
lr: 0.01
min_lr: 0.0001
decay_epochs: 198
warmup_epochs: 2
optimizer
opt: 'momentum'
momentum: 0.9
weight_decay: 0.00004
loss_scale: 1024
use_nesterov: False
训练结果:
Epoch TrainLoss Top_1_Accuracy Top_5_Accuracy TrainTime EvalTime TotalTime
1 1.659075 25.2044% 100.0000% 22.04 0.99 27.67
2 1.790772 19.0736% 100.0000% 6.21 0.84 10.10
3 1.747301 19.0736% 100.0000% 6.46 0.84 10.10
4 1.628069 19.0736% 100.0000% 6.18 0.78 9.68
5 1.661704 19.0736% 100.0000% 6.33 0.85 10.33
6 1.725484 19.0736% 100.0000% 6.19 0.85 10.06
7 1.674596 18.9373% 100.0000% 6.40 0.89 10.36
8 1.607921 19.0736% 100.0000% 6.25 0.75 10.25
9 1.670359 19.0736% 100.0000% 6.17 0.80 10.14
10 1.685464 19.0736% 100.0000% 6.22 0.87 10.75
11 1.688051 19.0736% 100.0000% 6.41 0.83 10.23
12 1.720397 19.0736% 100.0000% 6.22 0.78 10.54
13 1.750791 19.0736% 100.0000% 6.29 0.79 10.29
14 1.598438 19.0736% 100.0000% 6.18 0.83 9.85
15 1.609399 19.0736% 100.0000% 6.14 0.84 9.81
16 1.617299 19.0736% 100.0000% 6.17 0.95 10.13
17 1.744891 19.0736% 100.0000% 6.23 0.86 10.30
18 1.776682 19.0736% 100.0000% 6.18 0.83 9.81
19 1.670697 19.0736% 100.0000% 6.12 0.93 10.03
20 1.782085 19.0736% 100.0000% 6.36 0.83 10.14
Additional context / 备注 (Optional / 选填)
Add any other context about the problem here.
loss不收敛,精度也不对。麻烦看一下是什么问题;还有就是我把预训练模型下载下来了怎么进行指定?目前使用pretrained: True会自动下载且在固定位置,想问下怎么进行指定;