web-crawler🕷

A simple web crawler that given a URL would output a sitemap.

Install

pre-requisites

Node (> 12.18.0 )
npm

Run npm install to install the package dependencies via command line.

Build

Run npm run build to build the package.

Test

For running tests run npm test. It will build and run tests.

Setup

Running npm run setup would clean the repo, install dependencies, lint, build and also run tests.

Usage

Start the console application with npm start. It will ask for 3 variables:

URL required
Maximum level you want to crawl a page, default is 4
Maximum number of times you want to reprocess a URL if not successful, default is 3

Limitations

Only HTTP and HTTPS protocols are valid, other protocols will get rejected
The URL cannot be protocol relative
It cannot prevent getting blocked by websites if too many requests get sent, this is for websites with heavier content such as https://www.npmjs.com/ for instance.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
src		src
test		test
.eslintignore		.eslintignore
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
.prettierrc		.prettierrc
README.md		README.md
jest.config.js		jest.config.js
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

web-crawler🕷

Install

pre-requisites

Build

Test

Setup

Usage

Limitations

About

Uh oh!

Releases

Packages

Languages

nina-rafieifar/web-crawler

Folders and files

Latest commit

History

Repository files navigation

web-crawler🕷

Install

pre-requisites

Build

Test

Setup

Usage

Limitations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages