Skip to content

deep-assistant/web-capture

Repository files navigation

web-capture

Screenshot 2025-05-12 at 3 49 32 AM

A microservice to fetch URLs and render them as:

  • HTML: GET /html?url=
  • Markdown: GET /markdown?url=
  • PNG screenshot: GET /image?url=

Installation

npm install
# or
yarn install

Available Commands

Development

  • yarn dev - Start the development server with hot reloading using nodemon
  • yarn start - Start the service using Docker Compose

Testing

  • yarn test - Run all unit tests
  • yarn test:watch - Run tests in watch mode
  • yarn test:e2e - Run end-to-end tests
  • yarn test:e2e:docker - Run end-to-end tests against Docker container
  • yarn test:all - Run all tests including build and e2e tests

Building

  • yarn build - Build and start the Docker container

Examples

  • yarn examples:python - Run Python example scripts
  • yarn examples:javascript - Run JavaScript example scripts
  • yarn examples - Run all examples (requires build)

Usage

Local Development

yarn dev
curl http://localhost:3000/html?url=https://example.com

Docker

# Build and run using Docker Compose
yarn start

# Or manually
docker build -t web-capture .
docker run -p 3000:3000 web-capture

API Endpoints

HTML Endpoint

GET /html?url=<URL>

Returns the raw HTML content of the specified URL.

Markdown Endpoint

GET /markdown?url=<URL>

Converts the HTML content of the specified URL to Markdown format.

Image Endpoint

GET /image?url=<URL>

Returns a PNG screenshot of the specified URL.

Development

The service is built with:

  • Express.js for the web server
  • Puppeteer for headless browser automation and screenshots
  • Turndown for HTML to Markdown conversion
  • Jest for testing

License

UNLICENSED

About

A microservice to capture the web in required format

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published