The evaluation results of train.py and validate.py are inconsistent #157

qzwangUSTC · 2020-12-23T08:40:53Z

qzwangUSTC
Dec 23, 2020

I use the following parameters to train efficientdet-d0 from scratch.

./distributed_train.sh 4 /mscoco --model efficientdet_d0 -b 11 --amp --lr 0.06 --sync-bn --opt fusedmomentum --warmup-epochs 5 --lr-noise 0.4 0.9 --model-ema --model-ema-decay 0.9999

Good log output fragments are as follows：

Train: 234 [2500/2689 ( 93%)] Loss: 0.593521 (0.5254) Time: 0.495s, 88.80/s (0.537s, 81.88/s) LR: 9.842e-03 Data: 0.028 (0.027)
Train: 234 [2550/2689 ( 95%)] Loss: 0.582619 (0.5265) Time: 0.516s, 85.24/s (0.538s, 81.79/s) LR: 9.842e-03 Data: 0.027 (0.027)
Train: 234 [2600/2689 ( 97%)] Loss: 0.556304 (0.5270) Time: 0.586s, 75.13/s (0.538s, 81.84/s) LR: 9.842e-03 Data: 0.030 (0.027)
Train: 234 [2650/2689 ( 99%)] Loss: 0.538032 (0.5272) Time: 0.465s, 94.72/s (0.538s, 81.76/s) LR: 9.842e-03 Data: 0.031 (0.027)
Train: 234 [2688/2689 (100%)] Loss: 0.609689 (0.5278) Time: 1.239s, 12.91/s (0.538s, 29.72/s) LR: 9.842e-03 Data: 0.781 (0.028)
Test (EMA): [ 0/113] Time: 1.555 (1.555) Loss: 0.4949 (0.4949)
Test (EMA): [ 50/113] Time: 0.215 (0.236) Loss: 0.5803 (0.5480)
Test (EMA): [ 100/113] Time: 0.204 (0.223) Loss: 0.5558 (0.5510)
Test (EMA): [ 113/113] Time: 0.425 (0.221) Loss: 0.5754 (0.5531)
Current checkpoints:
('./output/train/20201218-122406-efficientdet_d0/checkpoint-212.pth.tar', 0.3401196940773372)
('./output/train/20201218-122406-efficientdet_d0/checkpoint-210.pth.tar', 0.34007189357458284)
('./output/train/20201218-122406-efficientdet_d0/checkpoint-211.pth.tar', 0.34005645098502846)
('./output/train/20201218-122406-efficientdet_d0/checkpoint-231.pth.tar', 0.3400186921308521)
('./output/train/20201218-122406-efficientdet_d0/checkpoint-209.pth.tar', 0.3400044905741661)
('./output/train/20201218-122406-efficientdet_d0/checkpoint-232.pth.tar', 0.33993297177380966)
('./output/train/20201218-122406-efficientdet_d0/checkpoint-213.pth.tar', 0.339857531277273)
('./output/train/20201218-122406-efficientdet_d0/checkpoint-230.pth.tar', 0.33980180631739004)
('./output/train/20201218-122406-efficientdet_d0/checkpoint-234.pth.tar', 0.3397794123140815)
('./output/train/20201218-122406-efficientdet_d0/checkpoint-208.pth.tar', 0.3397224963303912)

Train: 235 [ 0/2689 ( 0%)] Loss: 0.538377 (0.5384) Time: 1.825s, 24.10/s (1.825s, 24.10/s) LR: 5.600e-03 Data: 1.342 (1.342)
Train: 235 [ 50/2689 ( 2%)] Loss: 0.543684 (0.5410) Time: 0.508s, 86.54/s (0.547s, 80.50/s) LR: 5.600e-03 Data: 0.025 (0.055)
Train: 235 [ 100/2689 ( 4%)] Loss: 0.518846 (0.5336) Time: 0.500s, 88.01/s (0.555s, 79.32/s) LR: 5.600e-03 Data: 0.025 (0.042)
Train: 235 [ 150/2689 ( 6%)] Loss: 0.501451 (0.5256) Time: 0.555s, 79.21/s (0.547s, 80.43/s) LR: 5.600e-03 Data: 0.025 (0.038)
Train: 235 [ 200/2689 ( 7%)] Loss: 0.522107 (0.5249) Time: 0.484s, 90.91/s (0.541s, 81.26/s) LR: 5.600e-03 Data: 0.027 (0.035)
Train: 235 [ 250/2689 ( 9%)] Loss: 0.474666 (0.5165) Time: 0.551s, 79.81/s (0.548s, 80.29/s) LR: 5.600e-03 Data: 0.026 (0.034)

Based on the idea of saving a checkpoint in your code, './output/train/20201218-122406-efficientdet_d0/checkpoint-212.pth.tar' should be best.

But when I use validate.py to evaluate, the result is different from the training.

I use the following parameters to run validate.py :
python validate.py /localtion/of/mscoco/ --model efficientdet_d0 -b 10 ----checkpoint ./output/train/20201218-122406-efficientdet_d0/checkpoint-212.pth.tar

Loading and preparing results...
DONE (t=3.91s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type bbox
DONE (t=106.94s).
Accumulating evaluation results...
DONE (t=18.99s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.330
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.510
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.347
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.128
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.381
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.519
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.285
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.437
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.463
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.196
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.543
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.692

It is 0.34011969 in train.py and 0.330 in validate.py, but both use the same model weight. So, Why is that?

qzwangUSTC · 2020-12-23T09:15:06Z

qzwangUSTC
Dec 23, 2020
Author

Sorry, the problem described above may be a bit messy, this time I reformat it.

I use the following parameters to train efficientdet-d0 from scratch:

./distributed_train.sh 4 /mscoco --model efficientdet_d0 -b 11 --amp --lr 0.06 --sync-bn --opt fusedmomentum --warmup-epochs 5 --lr-noise 0.4 0.9 --model-ema --model-ema-decay 0.9999

Good log output fragments are as follows:

Train: 234 [2500/2689 ( 93%)] Loss: 0.593521 (0.5254) Time: 0.495s, 88.80/s (0.537s, 81.88/s) LR: 9.842e-03 Data: 0.028 (0.027)
Train: 234 [2550/2689 ( 95%)] Loss: 0.582619 (0.5265) Time: 0.516s, 85.24/s (0.538s, 81.79/s) LR: 9.842e-03 Data: 0.027 (0.027)
Train: 234 [2600/2689 ( 97%)] Loss: 0.556304 (0.5270) Time: 0.586s, 75.13/s (0.538s, 81.84/s) LR: 9.842e-03 Data: 0.030 (0.027)
Train: 234 [2650/2689 ( 99%)] Loss: 0.538032 (0.5272) Time: 0.465s, 94.72/s (0.538s, 81.76/s) LR: 9.842e-03 Data: 0.031 (0.027)
Train: 234 [2688/2689 (100%)] Loss: 0.609689 (0.5278) Time: 1.239s, 12.91/s (0.538s, 29.72/s) LR: 9.842e-03 Data: 0.781 (0.028)
Test (EMA): [ 0/113] Time: 1.555 (1.555) Loss: 0.4949 (0.4949)
Test (EMA): [ 50/113] Time: 0.215 (0.236) Loss: 0.5803 (0.5480)
Test (EMA): [ 100/113] Time: 0.204 (0.223) Loss: 0.5558 (0.5510)
Test (EMA): [ 113/113] Time: 0.425 (0.221) Loss: 0.5754 (0.5531)
Current checkpoints:
('./output/train/20201218-122406-efficientdet_d0/checkpoint-212.pth.tar', 0.3401196940773372)
('./output/train/20201218-122406-efficientdet_d0/checkpoint-210.pth.tar', 0.34007189357458284)
('./output/train/20201218-122406-efficientdet_d0/checkpoint-211.pth.tar', 0.34005645098502846)
('./output/train/20201218-122406-efficientdet_d0/checkpoint-231.pth.tar', 0.3400186921308521)
('./output/train/20201218-122406-efficientdet_d0/checkpoint-209.pth.tar', 0.3400044905741661)
('./output/train/20201218-122406-efficientdet_d0/checkpoint-232.pth.tar', 0.33993297177380966)
('./output/train/20201218-122406-efficientdet_d0/checkpoint-213.pth.tar', 0.339857531277273)
('./output/train/20201218-122406-efficientdet_d0/checkpoint-230.pth.tar', 0.33980180631739004)
('./output/train/20201218-122406-efficientdet_d0/checkpoint-234.pth.tar', 0.3397794123140815)
('./output/train/20201218-122406-efficientdet_d0/checkpoint-208.pth.tar', 0.3397224963303912)

Train: 235 [ 0/2689 ( 0%)] Loss: 0.538377 (0.5384) Time: 1.825s, 24.10/s (1.825s, 24.10/s) LR: 5.600e-03 Data: 1.342 (1.342)
Train: 235 [ 50/2689 ( 2%)] Loss: 0.543684 (0.5410) Time: 0.508s, 86.54/s (0.547s, 80.50/s) LR: 5.600e-03 Data: 0.025 (0.055)
Train: 235 [ 100/2689 ( 4%)] Loss: 0.518846 (0.5336) Time: 0.500s, 88.01/s (0.555s, 79.32/s) LR: 5.600e-03 Data: 0.025 (0.042)
Train: 235 [ 150/2689 ( 6%)] Loss: 0.501451 (0.5256) Time: 0.555s, 79.21/s (0.547s, 80.43/s) LR: 5.600e-03 Data: 0.025 (0.038)
Train: 235 [ 200/2689 ( 7%)] Loss: 0.522107 (0.5249) Time: 0.484s, 90.91/s (0.541s, 81.26/s) LR: 5.600e-03 Data: 0.027 (0.035)
Train: 235 [ 250/2689 ( 9%)] Loss: 0.474666 (0.5165) Time: 0.551s, 79.81/s (0.548s, 80.29/s) LR: 5.600e-03 Data: 0.026 (0.034)

Based on the idea of saving a checkpoint in your code, './output/train/20201218-122406-efficientdet_d0/checkpoint-212.pth.tar' should be best. But when I use validate.py to evaluate, the result is different from the training. I use the following parameters to run validate.py :

python validate.py /localtion/of/mscoco/ --model efficientdet_d0 -b 10 ----checkpoint ./output/train/20201218-122406-efficientdet_d0/checkpoint-212.pth.tar

Loading and preparing results...
DONE (t=3.91s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type bbox
DONE (t=106.94s).
Accumulating evaluation results...
DONE (t=18.99s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.330
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.510
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.347
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.128
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.381
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.519
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.285
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.437
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.463
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.196
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.543
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.692

It is 0.34011969 in train.py and 0.330 in validate.py, but both use the same model weight. So, Why is that?

0 replies

rwightman · 2020-12-23T16:46:36Z

rwightman
Dec 23, 2020
Maintainer

@qzwangUSTC you likely need to specify to use the EMA weights when calling the validation script --use-ema ... or you can post process the checkpoint with the clean_checkpoint.py script, by default it extracts the EMA weights (if they are there) and gives you a checkpoint without any of the training specifics (just one state_dict).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

The evaluation results of train.py and validate.py are inconsistent #157

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

The evaluation results of train.py and validate.py are inconsistent #157

Uh oh!

qzwangUSTC Dec 23, 2020

Replies: 2 comments

Uh oh!

qzwangUSTC Dec 23, 2020 Author

I use the following parameters to train efficientdet-d0 from scratch:

Good log output fragments are as follows:

Based on the idea of saving a checkpoint in your code, './output/train/20201218-122406-efficientdet_d0/checkpoint-212.pth.tar' should be best. But when I use validate.py to evaluate, the result is different from the training. I use the following parameters to run validate.py :

It is 0.34011969 in train.py and 0.330 in validate.py, but both use the same model weight. So, Why is that?

Uh oh!

rwightman Dec 23, 2020 Maintainer

qzwangUSTC
Dec 23, 2020

qzwangUSTC
Dec 23, 2020
Author

rwightman
Dec 23, 2020
Maintainer