Skip to content

A Node.js-based job scraping tool that fetches job listings from job websites, with optional proxy support to avoid rate limiting and IP blocking. **Created for learning purposes**.

Notifications You must be signed in to change notification settings

rasamie3/Proxy-Scraper-Job-Search-Tool

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Proxy Scraper - Job Search Tool

A Node.js-based job scraping tool that fetches job listings from job websites, with optional proxy support to avoid rate limiting and IP blocking. Created for learning purposes.

Features

  • Job Scraping: Extract job listings from any desired job site based on job titles
  • Proxy Support: Optional proxy integration to avoid IP blocking and rate limiting
  • Interactive CLI: User-friendly command-line interface
  • Structured Data: Extracts job title, company, location, time posted, and URL

Prerequisites

  • Node.js (version 14 or higher)
  • npm or yarn package manager

Installation

  1. Install dependencies:
npm install
  1. Create a .env file in the root directory with your configuration:
# Proxy Configuration (optional)
proxuser=your_proxy_username
proxpassword=your_proxy_password
proxip=your_proxy_ip
proxport=your_proxy_port

# Proxy List (JSON format)
proxies=[
  {
    "username": "user1",
    "password": "pass1",
    "ip": "proxy1.example.com",
    "port": "8080"
  },
  {
    "username": "user2",
    "password": "pass2",
    "ip": "proxy2.example.com",
    "port": "8080"
  }
]

# Target URL
url_1=website
jURL=website

Usage

Interactive Mode

Run the interactive CLI tool:

node jobsearch.mjs

The tool will prompt you for:

  1. Job title: The position you're searching for (e.g., "developer", "frontend", "react")
  2. Use proxy: Whether to use proxy (y/n)

Programmatic Usage

You can also use the scraper directly in your code:

import { scrapeJobs } from './scraper.mjs';

// Search without proxy
await scrapeJobs('developer', false);

// Search with proxy
await scrapeJobs('frontend developer', true);

Project Structure

proxy-scraper/
├── config.js          # Configuration and environment variables
├── jobsearch.mjs      # Interactive CLI interface
├── proxies.mjs        # Proxy management and selection
├── scraper.mjs        # Main scraping logic
├── package.json       # Dependencies and scripts
└── README.md         # This file

Configuration

Environment Variables

  • proxuser: Proxy username
  • proxpassword: Proxy password
  • proxip: Proxy server IP address
  • proxport: Proxy server port
  • proxies: JSON array of proxy configurations

Proxy Configuration

The tool supports multiple proxy configurations. You can provide them in two ways:

  1. Single Proxy: Use individual environment variables
  2. Multiple Proxies: Use the proxies JSON array for automatic proxy rotation

Output Format

The scraper returns job listings in the following format:

{
  title: "Job Title",
  company: "Company Name", 
  location: "Job Location",
  timeAgo: "Posted time ago",
  url: "Full job URL"
}

Error Handling

The tool handles various error scenarios:

  • 403/429 Status Codes: Automatically detected as blocking responses
  • Network Errors: Graceful handling of connection issues
  • Invalid Proxy Configuration: Fallback to direct connection
  • Invalid JSON: Safe parsing of proxy configurations

Dependencies

  • cheerio: HTML parsing and DOM manipulation
  • node-fetch: HTTP client for making requests
  • https-proxy-agent: Proxy support for HTTPS requests
  • dotenv: Environment variable management

⚠️ Legal Notice

This tool is for educational purposes only. Please ensure you comply with:

  • The target website's Terms of Service
  • Rate limiting policies
  • Applicable laws and regulations

Always use this tool responsibly and respect website policies.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Troubleshooting

Common Issues

  1. 403/429 Errors: Try using a proxy or wait before making more requests
  2. No jobs found: Check if the job title is valid or try different keywords
  3. Proxy connection failed: Verify your proxy credentials and configuration
  4. Environment variables not loading: Ensure your .env file is in the root directory

Getting Help

If you encounter any issues:

  1. Check the console output for error messages
  2. Verify your environment configuration
  3. Ensure all dependencies are installed
  4. Check your internet connection and proxy settings

Happy Job Hunting!

About

A Node.js-based job scraping tool that fetches job listings from job websites, with optional proxy support to avoid rate limiting and IP blocking. **Created for learning purposes**.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published