Skip to content

override encoding cannot help when receiving not supported context encoding #326

@y-code

Description

@y-code

Description

When I tried to load a page from https://www.jamieoliver.com/ by HtmlWeb.Load method, it failed with an ArgumentException.

It turned out to be because the response headers from the site has content-encoding: identity. As per HTTP RFC 2616, identity is used only in the Accept- Encoding header, and SHOULD NOT be used in the Content-Encoding header., so that it is of course that Encoding class does not support identity.

Therefore, next, I specified Encoding.UTF8 to OverrideEncoding property and called HtmlDocument.Load method. However, it didn't make any change and I got the same ArgumentException.

I expected OverrideEncoding property make HtmlWeb class to ignore the Content-Encoding in the response headers from server and to decode content by specified encoding in OverrideEncoding property, but it was not the case.

While it allows overriding the encoding specified by server when the encoding name is valid, it would be ideal that it also worked when the server specified encoding name is invalid.

Exception

Exception message:
System.ArgumentException : 'identity' is not a supported encoding name. For information on defining a custom encoding, see the documentation for the Encoding.RegisterProvider method.
Parameter name: name

Stack trace:
   at System.Text.EncodingTable.GetCodePageFromName(String name)
   at System.Text.Encoding.GetEncoding(String name)
   at HtmlAgilityPack.HtmlWeb.Get(Uri uri, String method, String path, HtmlDocument doc, IWebProxy proxy, ICredentials creds) in /Users/yas/Projects/happyfl/html-agility-pack/src/HtmlAgilityPack.Shared/HtmlWeb.cs:line 1680
   at HtmlAgilityPack.HtmlWeb.LoadUrl(Uri uri, String method, WebProxy proxy, NetworkCredential creds) in /Users/yas/Projects/happyfl/html-agility-pack/src/HtmlAgilityPack.Shared/HtmlWeb.cs:line 2068
   at HtmlAgilityPack.HtmlWeb.Load(Uri uri, String method) in /Users/yas/Projects/happyfl/html-agility-pack/src/HtmlAgilityPack.Shared/HtmlWeb.cs:line 1290
   at HtmlAgilityPack.HtmlWeb.Load(Uri uri) in /Users/yas/Projects/happyfl/html-agility-pack/src/HtmlAgilityPack.Shared/HtmlWeb.cs:line 1189
   at HappyFL.Services.WebSeekers.RecipeSeeker.Scan() in /Users/yas/Projects/happyfl/HappyFL/Services/WebSeekers/RecipeSeeker.cs:line 34
   at HappyFL.Services.WebSeekerService.FindRecipes(Uri url, Nullable`1 cancel, Encoding encode) in /Users/yas/Projects/happyfl/HappyFL/Services/WebSeekerService.cs:line 159
   at HappyFL.Test.WebSeekerServiceTest.TestFindRecipe(String url, ExpectedResultForTestFindRecipe expected) in /Users/yas/Projects/happyfl/HappyFLTest/WebSeekerServiceTest.cs:line 167

Project to reproduce issue

https://github.com/y-code/repro-bug-in-html-agility-pack

Further technical details

  • HAP version: 1.11.12
  • NET version (net472, netcore, etc.): .NET Core 2.2.300

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions