-
-
Notifications
You must be signed in to change notification settings - Fork 390
Description
Description
When I tried to load a page from https://www.jamieoliver.com/ by HtmlWeb.Load method, it failed with an ArgumentException.
It turned out to be because the response headers from the site has content-encoding: identity
. As per HTTP RFC 2616, identity
is used only in the Accept- Encoding header, and SHOULD NOT be used in the Content-Encoding header.
, so that it is of course that Encoding class does not support identity
.
Therefore, next, I specified Encoding.UTF8 to OverrideEncoding property and called HtmlDocument.Load method. However, it didn't make any change and I got the same ArgumentException.
I expected OverrideEncoding property make HtmlWeb class to ignore the Content-Encoding in the response headers from server and to decode content by specified encoding in OverrideEncoding property, but it was not the case.
While it allows overriding the encoding specified by server when the encoding name is valid, it would be ideal that it also worked when the server specified encoding name is invalid.
Exception
Exception message:
System.ArgumentException : 'identity' is not a supported encoding name. For information on defining a custom encoding, see the documentation for the Encoding.RegisterProvider method.
Parameter name: name
Stack trace:
at System.Text.EncodingTable.GetCodePageFromName(String name)
at System.Text.Encoding.GetEncoding(String name)
at HtmlAgilityPack.HtmlWeb.Get(Uri uri, String method, String path, HtmlDocument doc, IWebProxy proxy, ICredentials creds) in /Users/yas/Projects/happyfl/html-agility-pack/src/HtmlAgilityPack.Shared/HtmlWeb.cs:line 1680
at HtmlAgilityPack.HtmlWeb.LoadUrl(Uri uri, String method, WebProxy proxy, NetworkCredential creds) in /Users/yas/Projects/happyfl/html-agility-pack/src/HtmlAgilityPack.Shared/HtmlWeb.cs:line 2068
at HtmlAgilityPack.HtmlWeb.Load(Uri uri, String method) in /Users/yas/Projects/happyfl/html-agility-pack/src/HtmlAgilityPack.Shared/HtmlWeb.cs:line 1290
at HtmlAgilityPack.HtmlWeb.Load(Uri uri) in /Users/yas/Projects/happyfl/html-agility-pack/src/HtmlAgilityPack.Shared/HtmlWeb.cs:line 1189
at HappyFL.Services.WebSeekers.RecipeSeeker.Scan() in /Users/yas/Projects/happyfl/HappyFL/Services/WebSeekers/RecipeSeeker.cs:line 34
at HappyFL.Services.WebSeekerService.FindRecipes(Uri url, Nullable`1 cancel, Encoding encode) in /Users/yas/Projects/happyfl/HappyFL/Services/WebSeekerService.cs:line 159
at HappyFL.Test.WebSeekerServiceTest.TestFindRecipe(String url, ExpectedResultForTestFindRecipe expected) in /Users/yas/Projects/happyfl/HappyFLTest/WebSeekerServiceTest.cs:line 167
Project to reproduce issue
https://github.com/y-code/repro-bug-in-html-agility-pack
Further technical details
- HAP version: 1.11.12
- NET version (net472, netcore, etc.): .NET Core 2.2.300