🕷️ HyperHunt-GO Web Crawler

A blazing fast, concurrent web crawler built in Go that helps you extract product information from e-commerce websites. This bad boy can handle sitemaps, crawl pages, and extract structured product data like a champ.

🚀 Features

Sitemap detection and parsing
Smart URL filtering and pattern matching
Product schema extraction (supports JSON-LD)
Fallback to OpenGraph meta tags
CSV export functionality
Concurrent crawling with rate limiting
Price normalization (handles IRR currency)

🛠️ Tech Stack

Colly - The beast powering our crawling
httpx - For robust HTTP interactions
goflags - CLI flag parsing
gologger - Logging made sexy

🏃‍♂️ Quick Start

Clone this repo:

git clone https://github.com/yourusername/HyperHunt-GO-web-crawler.git
cd HyperHunt-GO-web-crawler

Install dependencies:

go mod download

Run it:

go run main.go

🎯 How It Works

First, it checks for a sitemap at common locations (/sitemap.xml or /sitemap_index.xml)
If found, it parses the sitemap to extract all product URLs
For each URL, it:
- Attempts to extract product data from JSON-LD schema
- Falls back to OpenGraph meta tags if needed
- Normalizes prices and data formats
Exports results to CSV files:
- raw_links.csv: All discovered URLs
- proper_urls.csv: Filtered URLs matching product patterns

📦 Project Structure

.
├── main.go              # Entry point
├── pkg/
│   ├── crawler/        # Core crawling logic
│   ├── fileops/        # File operations (CSV handling)
│   ├── models/         # Data models
│   └── utils/          # Helper functions

🔧 Configuration

The crawler is configured to work with specific e-commerce sites out of the box. You can modify the base URL in main.go:

baseURL := "https://your-target-site.com/"

🤝 Contributing

PRs are welcome! Just:

Fork it
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a PR

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
pkg		pkg
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🕷️ HyperHunt-GO Web Crawler

🚀 Features

🛠️ Tech Stack

🏃‍♂️ Quick Start

🎯 How It Works

📦 Project Structure

🔧 Configuration

🤝 Contributing

About

Uh oh!

Releases

Packages

Languages

Siftman/HyperHunt-GO-web-crawler

Folders and files

Latest commit

History

Repository files navigation

🕷️ HyperHunt-GO Web Crawler

🚀 Features

🛠️ Tech Stack

🏃‍♂️ Quick Start

🎯 How It Works

📦 Project Structure

🔧 Configuration

🤝 Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages