Proxy Scraper - Job Search Tool

A Node.js-based job scraping tool that fetches job listings from job websites, with optional proxy support to avoid rate limiting and IP blocking. Created for learning purposes.

Features

Job Scraping: Extract job listings from any desired job site based on job titles
Proxy Support: Optional proxy integration to avoid IP blocking and rate limiting
Interactive CLI: User-friendly command-line interface
Structured Data: Extracts job title, company, location, time posted, and URL

Prerequisites

Node.js (version 14 or higher)
npm or yarn package manager

Installation

Install dependencies:

npm install

Create a .env file in the root directory with your configuration:

# Proxy Configuration (optional)
proxuser=your_proxy_username
proxpassword=your_proxy_password
proxip=your_proxy_ip
proxport=your_proxy_port

# Proxy List (JSON format)
proxies=[
  {
    "username": "user1",
    "password": "pass1",
    "ip": "proxy1.example.com",
    "port": "8080"
  },
  {
    "username": "user2",
    "password": "pass2",
    "ip": "proxy2.example.com",
    "port": "8080"
  }
]

# Target URL
url_1=website
jURL=website

Usage

Interactive Mode

Run the interactive CLI tool:

node jobsearch.mjs

The tool will prompt you for:

Job title: The position you're searching for (e.g., "developer", "frontend", "react")
Use proxy: Whether to use proxy (y/n)

Programmatic Usage

You can also use the scraper directly in your code:

import { scrapeJobs } from './scraper.mjs';

// Search without proxy
await scrapeJobs('developer', false);

// Search with proxy
await scrapeJobs('frontend developer', true);

Project Structure

proxy-scraper/
├── config.js          # Configuration and environment variables
├── jobsearch.mjs      # Interactive CLI interface
├── proxies.mjs        # Proxy management and selection
├── scraper.mjs        # Main scraping logic
├── package.json       # Dependencies and scripts
└── README.md         # This file

Configuration

Environment Variables

proxuser: Proxy username
proxpassword: Proxy password
proxip: Proxy server IP address
proxport: Proxy server port
proxies: JSON array of proxy configurations

Proxy Configuration

The tool supports multiple proxy configurations. You can provide them in two ways:

Single Proxy: Use individual environment variables
Multiple Proxies: Use the proxies JSON array for automatic proxy rotation

Output Format

The scraper returns job listings in the following format:

{
  title: "Job Title",
  company: "Company Name", 
  location: "Job Location",
  timeAgo: "Posted time ago",
  url: "Full job URL"
}

Error Handling

The tool handles various error scenarios:

403/429 Status Codes: Automatically detected as blocking responses
Network Errors: Graceful handling of connection issues
Invalid Proxy Configuration: Fallback to direct connection
Invalid JSON: Safe parsing of proxy configurations

Dependencies

cheerio: HTML parsing and DOM manipulation
node-fetch: HTTP client for making requests
https-proxy-agent: Proxy support for HTTPS requests
dotenv: Environment variable management

⚠️ Legal Notice

This tool is for educational purposes only. Please ensure you comply with:

The target website's Terms of Service
Rate limiting policies
Applicable laws and regulations

Always use this tool responsibly and respect website policies.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Troubleshooting

Common Issues

403/429 Errors: Try using a proxy or wait before making more requests
No jobs found: Check if the job title is valid or try different keywords
Proxy connection failed: Verify your proxy credentials and configuration
Environment variables not loading: Ensure your .env file is in the root directory

Getting Help

If you encounter any issues:

Check the console output for error messages
Verify your environment configuration
Ensure all dependencies are installed
Check your internet connection and proxy settings

Happy Job Hunting!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Proxy Scraper - Job Search Tool

Features

Prerequisites

Installation

Usage

Interactive Mode

Programmatic Usage

Project Structure

Configuration

Environment Variables

Proxy Configuration

Output Format

Error Handling

Dependencies

⚠️ Legal Notice

License

Troubleshooting

Common Issues

Getting Help

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
README.md		README.md
config.js		config.js
jobsearch.mjs		jobsearch.mjs
package-lock.json		package-lock.json
package.json		package.json
proxies.mjs		proxies.mjs
scraper.mjs		scraper.mjs

rasamie3/Proxy-Scraper-Job-Search-Tool

Folders and files

Latest commit

History

Repository files navigation

Proxy Scraper - Job Search Tool

Features

Prerequisites

Installation

Usage

Interactive Mode

Programmatic Usage

Project Structure

Configuration

Environment Variables

Proxy Configuration

Output Format

Error Handling

Dependencies

⚠️ Legal Notice

License

Troubleshooting

Common Issues

Getting Help

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages