This repository was archived by the owner on Nov 17, 2023. It is now read-only.
Problems related to use image_classification.py to train model #19539
Unanswered
Johnny-dai-git
asked this question in
General
Replies: 2 comments
-
@Johnny-dai-git a general note for image classification, I think you may benefit from GluonCV's training scripts, which incorporates many tricks for training better models. Also, if you have the available hardware, multi-GPU data parallel training is usually the most efficient mode for this problem size. |
Beta Was this translation helpful? Give feedback.
0 replies
-
Thank you so much.
…On Wed, Nov 18, 2020 at 12:54 PM Sheng Zha ***@***.***> wrote:
@Johnny-dai-git <https://github.com/Johnny-dai-git> a general note for
image classification, I think you may benefit from GluonCV's training
scripts <https://cv.gluon.ai/model_zoo/classification.html>, which
incorporates many tricks for training better models.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#19539 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANDGLB25DNFDGNCHW2UQ7ALSQQC5VANCNFSM4TXPDZXQ>
.
--
Yuanjun Dai (he/him)
P.hd
Department of Computer and Data Sciences
Case Western Reserve University
Phone: (216)-235-8330
Office: Glennan 505
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I am working on distributed learning. I am using the python code image_classification.py provided. However, problems show up:
Firstly, I am trying to use caltech101 to train a model. However, the worker can download the data-set and data can be extracted. However, I have no idea why the whole program will hang in data.py line 107 and never return the training_path, testing_path to the train loop. No error message shows up. It just hangs forever.
Secondly, I am trying to minist dataset to train the VGG11 VGG16 or alexnt.But error message shows up:**_
12:28:59] src/io/iter_mnist.cc:113: MNISTIter: load 20000 images, shuffle=1, shape=[32,1,28,28]
[12:28:59] src/io/iter_mnist.cc:113: MNISTIter: load 20000 images, shuffle=1, shape=[32,1,28,28]
[12:28:59] src/io/iter_mnist.cc:113: MNISTIter: load 3333 images, shuffle=1, shape=[32,1,28,28]
[12:28:59] src/io/iter_mnist.cc:113: MNISTIter: load 20000 images, shuffle=1, shape=[32,1,28,28]
terminate called after throwing an instance of 'dmlc::Error'
what(): [12:28:59] src/operator/nn/pooling.cc:190: Check failed: param.kernel[0] <= dshape_nchw[2] + 2 * param.pad[0]: kernel size (2) exceeds input (1 padded to 1)
Stack trace:
[bt] (0) /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so(+0x307d3b) [0x7f225736dd3b]
[bt] (1) /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so(+0xb811eb) [0x7f2257be71eb]
[bt] (2) /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so(mxnet::imperative::SetShapeType(mxnet::Context const&, nnvm::NodeAttrs const&, std::vector<mxnet::NDArray*, std::allocatormxnet::NDArray* > const&, std::vector<mxnet::NDArray*, std::allocatormxnet::NDArray* > const&, mxnet::DispatchMode*)+0x1d27) [0x7f225a507aa7]
[12:28:59] src/io/iter_mnist.cc:113: MNISTIter: load 3333 images, shuffle=1, shape=[32,1,28,28]
bash: line 1: 23110 Aborted python3 image_classification.py --dataset mnist --model vgg11 --epochs 1 --kvstore dist_async
Thirdly, I try to use imagenet to train the model, however, I need to pass parameters called --data-dir. What is it for?
After I looked into the source code. Does it seem that I need to download the imagenet dataset by myself and pass it to the workers?
Fourthly. Could you tell me which dataset will work on which model based on your image_classification.py?
Best Regards,
Johnny
Beta Was this translation helpful? Give feedback.
All reactions