-
Notifications
You must be signed in to change notification settings - Fork 81
Create utility class URL builder and URL encoder. Fix a tricky URL path issue #844
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create utility class URL builder and URL encoder. Fix a tricky URL path issue #844
Conversation
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kingcrimsontianyu. The docs look good, and I confirmed that this fixes the issue we saw with =
earlier.
I think as we discussed earlier, we weren't able to reproduce this issue with our moto-based tests because it handles authentication differently. So I think the tests in test_url.cpp
are sufficient.
Hmm. The URL cpp test failed on ARM + conda. Looking into this ... |
The test failure was caused by a misuse of string view, that resulted in UB. This has been fixed. |
Redid the test that uses pylibcudf to read an exotically named S3 object |
/merge |
Background
Initial problem
There is currently an unsolved problem in libcurl, which somehow is mislabeled as merged/solved in curl/curl#13754. For AWS S3 that requires credentials, if an object key name contains
=
, libcurl will fail with an HTTP 403 response. This problem does not occur to public S3 objects. This can be reproduced using thecurl
program:Additional problem
It has been found that beyond
=
alone, other special characters such as!*'()
in a private S3 object will also cause libcurl error. In addition, some characters such as+
in a public S3 object will cause the same error.This PR
This PR addresses this problem by handling special characters listed in the AWS object key naming guidelines, for both private and public S3 object names. The KvikIO-specific object key naming guidelines are added to the remote file documentation.
Specifically, this PR introduces utility classes
UrlBuilder
(to complement the existingUrlParser
), which builds a URL according to the user-provided components, andUrlEncoder
which uses a compile-time, percent-encoding lookup table to encode selected characters.Closes #823