Web crawler

The app is made to crawl a page with provided URL for image links and if needed crawl all the links within also provided depth. Results of crawling are written into results.json file in the root folder of the project. Each image is saved as an object in array in format { "imageUrl": "string", "sourceUrl": "string // the page url this image was found on", "depth": "number // the depth of the source at which this image was found on" }

To run the app you need to:

Clone this repository and run npm install to load all the packages;
Run the script with npx ts-node crawler.ts <start_url: string> <depth: number> or npm run dev <start_url: string> <depth: number>

What is used

Node.js;
Typescript;
Zod (to validate command line arguments);
Puppeteer for crawling;

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
utils		utils
.gitignore		.gitignore
README.md		README.md
crawler.ts		crawler.ts
nodemon.json		nodemon.json
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Web crawler

What is used

About

Uh oh!

Releases

Packages

Uh oh!

Languages

killthecreator/web-crawler

Folders and files

Latest commit

History

Repository files navigation

Web crawler

What is used

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages