Relative URL handling seems to be different from browser #3031
-
Hey all! We are currently running a testsuite for different crawling scenarios for our app. For this we use a cheerio crawler, using enqueue links (using Strategy ALL) to extract the urls on the website.
This html-dummy is called from a local server that runs on a The request object in the enqueueLinks function gives back the urls as:
For a link with a leading slash that behavious matches the browser Without the leading slash, it does not, it should point to Am I mistaken here? I am kinda stumped if the issue lies with our implementation |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Nevermind, the reason is actually described in the mozilla docs for url handling in browsers The issue results because we didn't call |
Beta Was this translation helpful? Give feedback.
Nevermind, the reason is actually described in the mozilla docs for url handling in browsers
https://developer.mozilla.org/en-US/docs/Web/API/URL_API/Resolving_relative_references#current_directory_relative
The issue results because we didn't call
/demo/
, but instead/demo
. The latter results in the base directory being identical to root, because/demo
is treated like a file, while we need to call/demo/
so the base directory is demo