Skip to content
This repository was archived by the owner on Aug 10, 2020. It is now read-only.

trandoshan-io/crawler

Repository files navigation

crawler

Build Status Go Report Card Maintainability

Crawler is a Go written program designed to crawl website

features

  • use tor SOCKS proxy to crawl hidden services
  • fast, built using valyala/fasthttp (up to 10x faster than net/http)
  • extract both absolute and relative URLs
  • use scalable messaging protocol (nats)

how it work

  • The Crawler process connect to a nats server (specified by env variable NATS_URI) and set-up a subscriber for message with tag todoSubject
  • When an URL is received the crawler start crawling
  • When crawling is done, the crawler will publish content to nats server with subject contentSubject and found urls with subject doneSubject

About

Go process used to crawl websites

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors 2

  •  
  •