Skip to content

The Multi_Accuracy metric is not compatible with mxnet 1.6.0 #14

@suyz526

Description

@suyz526

Hi,

I tried to train the network by just changing the batchsize and gpus in the default setting. And I get the following error, which occurs after the finishing of the first batch.

[09:20:05] src/nnvm/legacy_json_util.cc:209: Loading symbol saved by previous version v0.8.0. Attempting to upgrade... [09:20:05] src/nnvm/legacy_json_util.cc:217: Symbol successfully upgraded! [09:20:05] src/nnvm/legacy_json_util.cc:209: Loading symbol saved by previous version v0.8.0. Attempting to upgrade... [09:20:05] src/nnvm/legacy_json_util.cc:217: Symbol successfully upgraded! INFO:root:start with arguments Namespace(batch_size=2, benchmark=0, data_nthreads=128, disp_batches=20, dtype='float32', gpus='0', image_shape='3,512,512', kv_store='device', load_epoch=None, lr=0.1, lr_factor=0.1, lr_step_epochs='100,200', max_random_aspect_ratio=0, max_random_h=0, max_random_l=0, max_random_rotate_angle=0, max_random_s=0, max_random_scale=1, max_random_shear_ratio=0, min_random_scale=1, model_prefix='./model/tasn', mom=0, monitor=0, network=None, num_classes=200, num_epochs=300, num_examples=5994, num_layers=None, optimizer='sgd', pad_size=0, random_crop=1, random_mirror=1, rgb_mean='123.68,116.779,103.939', test_io=0, top_k=5, wd=0) [09:20:05] src/io/iter_image_recordio_2.cc:178: ImageRecordIOParser2: ./data/cub/train.rec, use 4 threads for decoding.. [09:20:08] src/io/iter_image_recordio_2.cc:178: ImageRecordIOParser2: ./data/cub/val.rec, use 4 threads for decoding.. learning rate from ``lr_scheduler`` has been overwritten by ``learning_rate`` in optimizer. INFO:root:Epoch[0] Batch [0-20] Speed: 33.71 samples/sec att_net_accuracy=0.000000 part_net_accuracy=0.023810 master_net_accuracy=0.023810 part_net_aux_accuracy=0.023810 master_net_aux_accuracy=0.023810 distillation_loss=5.296982 Traceback (most recent call last): File "train.py", line 57, in <module> eval_metric = evaluate.Multi_Accuracy(num=6)) File "/home/ysu/project/attention_net/tasn/tasn-mxnet/example/tasn/common/fit.py", line 195, in fit monitor = monitor File "/home/ysu/mxnet_attention/lib/python3.5/site-packages/mxnet/module/base_module.py", line 533, in fit self.update_metric(eval_metric, data_batch.label) File "/home/ysu/mxnet_attention/lib/python3.5/site-packages/mxnet/module/module.py", line 775, in update_metric self._exec_group.update_metric(eval_metric, labels, pre_sliced) File "/home/ysu/mxnet_attention/lib/python3.5/site-packages/mxnet/module/executor_group.py", line 640, in update_metric eval_metric.update_dict(labels_, preds) File "/home/ysu/mxnet_attention/lib/python3.5/site-packages/mxnet/metric.py", line 133, in update_dict self.update(label, pred) File "/home/ysu/project/attention_net/tasn/tasn-mxnet/example/tasn/common/evaluate.py", line 32, in update self.sum_metric[i] += (pred_label.flat == label.flat).sum() TypeError: 'float' object is not subscriptable

The reason is that, in mxnet1.6.0, the EvalMetric class has not only num_inst , sum_metric, but also global_num_inst, global_sum_metric.

And in the batch_end_callback function (here is Speedometer), it will execute reset_local() function to reset num_inst , sum_metric, rather than reset() function as in the old version of mxnet.

However, you don't have the implementation of reset_local() in your Multi_Accuracy class. So the sum_metric will be reset as 0.0 using the reset_local() function in the EvalMetric class.

A quick solution could be, set the auto_reset argument in Speedometer as False.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions