Web Scraper | 2025 Activity

A Nodejs script that scrapes metadata & social links from public webpages.

Technology

Node
- Scraper running node version: (v14, v18, v20.10.0 default)
- Node Version Manager nvm
Puppeteer
- Node library which provides a high-level API to control Chrome
Typescript
- TypeScript is JavaScript with syntax for types. Doc
- Node.Js With TypeScript

Dependencies

puppeteer
puppeteer-extra
puppeteer-extra-plugin-stealth
sharp
fs-extra
temp
rimraf (dev)
nodemon (dev)
ts-node (dev)
typescript (dev)

Install all dependencies with:

npm install

Structure

  build
    └── index.js
    └── ...
  config
    └── config.json
  src
    └── pages
        ├── index.ts
        ├── identifiers.ts
    └── environment
        ├── config.ts
    └── utils
        ├── index.ts
    └── index.ts
  types
    └── index.d.ts
  outputs
    └── *.json
  screenshots
    └── *.jpg

build: The latest generated javascript code.
config: Deployment and proxy configuration.
src: The main coding part of the scraper, written by typescript.
types: Type or Interface definition.
outputs: Scraped data in JSON format.
screenshots: Compressed screenshots in JPG format.

Environment Variables

DOMAINS (required): Comma-separated list of domains to scrape, e.g. github.com,ranbot.online
HEADLESS (optional): Set to true or false to control browser mode (default from config).
ENV (optional): Used in Docker, default is production.
CONCURRENCY (optional): Used in Docker, default is 8.

Scripts Overview

npm run start:dev

Starts the application in development using nodemon and ts-node to do cold reloading.

npm run build

Builds the app at build, cleaning the folder first.

npm run start

Starts the app in production by first building the project with npm run build, and then executing the compiled JavaScript at build/index.js.

Usage Examples

env DOMAINS=github.com node build/index.js

Or with multiple domains:

env DOMAINS=github.com,ranbot.online node build/index.js

Output

Screenshots: ./screenshots/<domain>.jpg (compressed, 1024px wide)
Data: ./outputs/<domain>_<timestamp>.json (pretty-printed)

Docker Usage

Build and run the scraper in Docker:

docker build -t web-scraper .
docker run -e DOMAINS=github.com,ranbot.online web-scraper

Response Example

➜  web-scraper git:(main) ✗ env DOMAINS=github.com,ranbot.online node build/index.js
[2025-05-25T08:08:26.742Z] >> Starting Web Scraper ......
[2025-05-25T08:08:26.974Z] ┌─────────┬───────┬────────────────────────────────────────┐
│ (index) │ tries │               identifier               │
├─────────┼───────┼────────────────────────────────────────┤
│    0    │   0   │  { id: 0, identifier: 'github.com' }   │
│    1    │   0   │ { id: 1, identifier: 'ranbot.online' } │
└─────────┴───────┴────────────────────────────────────────┘
[2025-05-25T08:08:26.974Z] >> Queue Size: 2
[2025-05-25T08:08:26.974Z] { tries: 0, identifier: { id: 0, identifier: 'github.com' } }
[2025-05-25T08:08:27.029Z] [github.com] -> visiting: https://github.com
[2025-05-25T08:08:35.876Z] [github.com] -> page loaded
[2025-05-25T08:08:41.790Z] [github.com] -> screenshot written to ./screenshots/github.jpg
[2025-05-25T08:08:41.792Z] [github.com] -> data written to ./outputs/github.com_2025-05-25T08-08-41.791Z.json
[2025-05-25T08:08:41.794Z] { tries: 0, identifier: { id: 1, identifier: 'ranbot.online' } }
[2025-05-25T08:08:41.864Z] [ranbot.online] -> visiting: https://ranbot.online
[2025-05-25T08:08:49.567Z] [ranbot.online] -> page loaded
[2025-05-25T08:08:52.675Z] [ranbot.online] -> screenshot written to ./screenshots/ranbot.jpg
[2025-05-25T08:08:52.675Z] [ranbot.online] -> data written to ./outputs/ranbot.online_2025-05-25T08-08-52.675Z.json

Contributors

Encore

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
config		config
src		src
types		types
.gitignore		.gitignore
.nvmrc		.nvmrc
LICENSE		LICENSE
README.md		README.md
nodemon.json		nodemon.json
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Web Scraper | 2025 Activity

Technology

Dependencies

Structure

Environment Variables

Scripts Overview

Usage Examples

Output

Docker Usage

Response Example

Contributors

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

ranbot-ai/web-scraper

Folders and files

Latest commit

History

Repository files navigation

Web Scraper | 2025 Activity

Technology

Dependencies

Structure

Environment Variables

Scripts Overview

Usage Examples

Output

Docker Usage

Response Example

Contributors

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages