-
Notifications
You must be signed in to change notification settings - Fork 26
mnist: use mirror only when original server is not available #196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Some tests have failed. I would like to fix them in a separate pull request. |
Because this may be used by subclasses. Co-authored-by: Sutou Kouhei <kou@clear-code.com>
Co-authored-by: Sutou Kouhei <kou@clear-code.com>
c5ef3b7
to
784a1d1
Compare
Co-authored-by: Sutou Kouhei <kou@clear-code.com>
Co-authored-by: Sutou Kouhei <kou@clear-code.com>
Co-authored-by: Sutou Kouhei <kou@clear-code.com>
Because `i` or `index` is clearer than `idx`. Co-authored-by: Sutou Kouhei <kou@clear-code.com>
Because `<` and `>` aren't part of the URL. Co-authored-by: Sutou Kouhei <kou@clear-code.com>
Because with no arguments, raises the exception in `$!` or raises a `RuntimeError` if `$!` is `nil`. Co-authored-by: Sutou Kouhei <kou@clear-code.com>
Thank you so much for a lot of suggestions! @kou It has been an exciting and enjoyable experience. I've resolved all your reviewed points, so could you review it again? |
lib/datasets/downloader.rb
Outdated
@@ -155,6 +160,16 @@ def download(output_path, &block) | |||
$stderr.puts "Redirect to #{url}" | |||
return start_http(url, headers, limit - 1, &block) | |||
else | |||
if response.is_a?(Net::HTTPForbidden) | |||
fallback_url = @fallback_urls.shift |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid changing @fallback_urls
destructively?
If we change @fallback_urls
destructively, users can't call #download
multiple times for the same Downloader
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't notice users can't call #download
multiple times for the same Downloader
.
I will try to avoid changing @fallback_urls
destructively.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've implemented in f92ec1a. But I have some concerns.
If the same URL is specified in fallback_urls
, it can result in an infinite loop:
diff --git a/lib/datasets/mnist.rb b/lib/datasets/mnist.rb
index 9129be9..a3e6230 100644
--- a/lib/datasets/mnist.rb
+++ b/lib/datasets/mnist.rb
@@ -54,6 +54,8 @@ module Datasets
private
def base_urls
[
+ "http://yann.lecun.com/exdb/mnist/",
+ "http://yann.lecun.com/exdb/mnist/",
"http://yann.lecun.com/exdb/mnist/",
"https://ossci-datasets.s3.amazonaws.com/mnist/",
]
❯ ruby -I lib test/run-test.rb -t MNISTTest
Loaded suite test
Started
|403: Forbidden: fallback: <http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz> -> <http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz>
403: Forbidden: fallback: <http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz> -> <http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz>
403: Forbidden: fallback: <http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz> -> <http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz>
(snip)
While I believe this is a rare case, do we need to consider this scenario?
If we do need to consider it, I think we might need to use an instance variable like @fallback_index
.
But, this could make the code less readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using argument instead?
diff --git a/lib/datasets/downloader.rb b/lib/datasets/downloader.rb
index fac8a13..3449855 100644
--- a/lib/datasets/downloader.rb
+++ b/lib/datasets/downloader.rb
@@ -40,7 +40,7 @@ module Datasets
headers["Range"] = "bytes=#{start}-"
end
- start_http(@url, headers) do |response|
+ start_http(@url, @fallback_urls, headers) do |response|
if response.is_a?(Net::HTTPPartialContent)
mode = "ab"
else
@@ -140,7 +140,7 @@ module Datasets
end
end
- private def start_http(url, headers, limit = 10, &block)
+ private def start_http(url, fallback_urls, headers, limit = 10, &block)
if limit == 0
raise TooManyRedirects, "too many redirections: #{url}"
end
@@ -158,17 +158,15 @@ module Datasets
when Net::HTTPRedirection
url = URI.parse(response[:location])
$stderr.puts "Redirect to #{url}"
- return start_http(url, headers, limit - 1, &block)
+ return start_http(url, fallback_urls, headers, limit - 1, &block)
else
if response.is_a?(Net::HTTPForbidden)
- index = @fallback_urls.index(url)
- fallback_index = index.nil? ? 0 : index + 1
- fallback_url = @fallback_urls[fallback_index]
- unless fallback_url.nil?
+ next_url, *rest_fallback_urls = fallback_urls
+ if next_url
message = "#{response.code}: #{response.message}: " +
- "fallback: <#{url}> -> <#{fallback_url}>"
+ "fallback: <#{url}> -> <#{next_url}>"
$stderr.puts(message)
- return start_http(fallback_url, headers, limit, &block)
+ return start_http(next_url, rest_fallback_urls, headers, &block)
end
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow! Using argument, we don't have an instance variable, and the code became to readable.
I will implement it. Thank you so much!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've implemented in 7b78fc2. Could you review it again?
Because users can't call `#download` multiple times for the same `Downloader`. Co-authored-by: Sutou Kouhei <kou@clear-code.com>
If the same URL is specified in `fallback_urls`. Co-authored-by: Sutou Kouhei <kou@clear-code.com>
Thanks! |
Thanks for your review! |
MNIST's original server is sometimes refusing to provide the requested
resource. So, we use the mirror server only when the original server is
refusing it.
Moreover this includes the following changes.
Fixes GH-195.