Skip to content

nina-rafieifar/web-crawler

Repository files navigation

web-crawler🕷

A simple web crawler that given a URL would output a sitemap.

Install

pre-requisites

Run npm install to install the package dependencies via command line.

Build

Run npm run build to build the package.

Test

For running tests run npm test. It will build and run tests.

Setup

Running npm run setup would clean the repo, install dependencies, lint, build and also run tests.

Usage

Start the console application with npm start. It will ask for 3 variables:

  • URL required
  • Maximum level you want to crawl a page, default is 4
  • Maximum number of times you want to reprocess a URL if not successful, default is 3

Limitations

  • Only HTTP and HTTPS protocols are valid, other protocols will get rejected
  • The URL cannot be protocol relative
  • It cannot prevent getting blocked by websites if too many requests get sent, this is for websites with heavier content such as https://www.npmjs.com/ for instance.

About

A simple web crawler that given a URL would output a sitemap

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published