-
-
Notifications
You must be signed in to change notification settings - Fork 105
Description
Describe the bug
As mentioned here #114 (comment) the issue occurs on URLs with # and it gets cut off.
I traced this issue to DirectoryParser.cs and specifically the CleanFragments function.
The CleanFragments function believes that the URL provided has a URI fragment rather than a legitimate file hence being cut off. This is most likely due to some URL decode and manipulation further up before it hits this CleanFragments function. The fix would probably be to make sure any non URI fragments are %23 encoded before hitting this function
For anyone else hit by this issue as a workaround if you are not likely going to hit a URI Fragment when scraping you can comment out the following section of the code in the CheckParsedResults function in the file DirectoryParser.cs :
if (webDirectory.Uri.Scheme != Constants.UriScheme.Ftp && webDirectory.Uri.Scheme != Constants.UriScheme.Ftps)
{
//CleanFragments(webDirectory);
}
To Reproduce
Steps to reproduce the behavior:
See here for examples of this #114 (comment)
Expected behavior
Instead of http://mrclancy.ca/Film%20and%20TV/Movies/MST%20Clips/He-Man%20and%20the%20Masters%20of%20the%20Universe%20-%20
Desktop (please complete the following information):
- OS: macOS
- Version: Latest Master Build (v3.5.0.0 + 2 commits)