A simple web crawler that given a URL would output a sitemap.
Run npm install
to install the package dependencies via command line.
Run npm run build
to build the package.
For running tests run npm test
. It will build and run tests.
Running npm run setup
would clean the repo, install dependencies, lint, build and also run tests.
Start the console application with npm start
. It will ask for 3 variables:
- URL required
- Maximum level you want to crawl a page, default is 4
- Maximum number of times you want to reprocess a URL if not successful, default is 3
- Only HTTP and HTTPS protocols are valid, other protocols will get rejected
- The URL cannot be protocol relative
- It cannot prevent getting blocked by websites if too many requests get sent, this is for websites with heavier content such as
https://www.npmjs.com/
for instance.