Training ResNet-20 with CIFAR-10 does not converge with MXNet1.3.0 #15863
Replies: 4 comments
-
@mxnet-label-bot add [training, question] |
Beta Was this translation helpful? Give feedback.
-
@apeforest : Would you be able to help or suggest who could help? |
Beta Was this translation helpful? Give feedback.
-
How were you building ResNet and running the training? Can you share the script/code or a minimal example so that we can try to reproduce your results? |
Beta Was this translation helpful? Give feedback.
-
Thakns, zachgk. I do not build the Resnet-20 on my own, I used the ResNet-20 under the directory /mxnet/example/image-classification/symbol. No modification is made to the code. However, the problem does exit in the version 0.12.1 or 1.4.1. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
We rent a cluster of 8 workers to run some experiments with MXNet1.3.0, each node of the cluster is installed with cuda-8.0, cudnn-6.0. When training ResNet-50 with CIFAR-10, the validation-accuracy is always around 10%, and when training ResNet-50 with ImageNet, the validation-accuracy is around 57% after training for 70 epochs. However, when training with MXNet0.12.1, there exists no such problem. Anyone with any suggetstions?
Beta Was this translation helpful? Give feedback.
All reactions