Replies: 1 comment 2 replies
-
Technically what you want is to interpret as bytes not latin-1 - latin-1 is only relevant in that latin-1 characters map to the bytes equivalent to the same unicode codepoints. Since all you're doing is constructing a URL you can do that rather than using the "form" generator, try something like this:
|
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I'm at the absolute very beginning of rewriting the very first module I ever pushed to CPAN to try and fall in love with Perl again and of course I'm bumping my head on something.
To scrape a BitTorrent tracker, you must include a SHA-1 hash as a query parameter. Trackers expect a properly escaped, ISO-8859-1/Latin-1 encoded string. Of course, Mojo::UserAgent, by default, encodes query params to UTF-8 which causes my escaped infohash look like
%1B%C3%90%C2%88%C3%AE%C2%91f%C2%A0b%C3%8FJ%C3%B0%C2%9C%C3%B9%C2%97+%C3%BAn%1A13
rather than the expected%1B%D0%88%EE%91f%A0b%CFJ%F0%9C%F9%97+%FAn%1A13
.This should be overridable but, for
HEAD
andGET
,Mojo::UserAgent::Transactor::_form(...)
silently ignores the user definedcharset
when merging query parameters. I hesitate to report this as an issue so I'm posting it here; I'm sure there's a solid reason it functions this way (I'm not an HTTP standards expert) but I can't find a git blame or any other discussion around it and no other HTTP client is making this choice by default. Am I missing something? Is setting the charset manually with$tx->req->url->query->charset(undef);
my only/best option?Code Example
Here's a minimal example if such a thing is needed. This scrapes the Debian project's tracker for debian-12.7.0-amd64-netinst.iso.torrent so it shouldn't set off alarms at most rational ISPs but you could always just comment out the
start(...)
call.Beta Was this translation helpful? Give feedback.
All reactions