A Node.js-based job scraping tool that fetches job listings from job websites, with optional proxy support to avoid rate limiting and IP blocking. Created for learning purposes.
- Job Scraping: Extract job listings from any desired job site based on job titles
- Proxy Support: Optional proxy integration to avoid IP blocking and rate limiting
- Interactive CLI: User-friendly command-line interface
- Structured Data: Extracts job title, company, location, time posted, and URL
- Node.js (version 14 or higher)
- npm or yarn package manager
- Install dependencies:
npm install
- Create a
.env
file in the root directory with your configuration:
# Proxy Configuration (optional)
proxuser=your_proxy_username
proxpassword=your_proxy_password
proxip=your_proxy_ip
proxport=your_proxy_port
# Proxy List (JSON format)
proxies=[
{
"username": "user1",
"password": "pass1",
"ip": "proxy1.example.com",
"port": "8080"
},
{
"username": "user2",
"password": "pass2",
"ip": "proxy2.example.com",
"port": "8080"
}
]
# Target URL
url_1=website
jURL=website
Run the interactive CLI tool:
node jobsearch.mjs
The tool will prompt you for:
- Job title: The position you're searching for (e.g., "developer", "frontend", "react")
- Use proxy: Whether to use proxy (y/n)
You can also use the scraper directly in your code:
import { scrapeJobs } from './scraper.mjs';
// Search without proxy
await scrapeJobs('developer', false);
// Search with proxy
await scrapeJobs('frontend developer', true);
proxy-scraper/
├── config.js # Configuration and environment variables
├── jobsearch.mjs # Interactive CLI interface
├── proxies.mjs # Proxy management and selection
├── scraper.mjs # Main scraping logic
├── package.json # Dependencies and scripts
└── README.md # This file
proxuser
: Proxy usernameproxpassword
: Proxy passwordproxip
: Proxy server IP addressproxport
: Proxy server portproxies
: JSON array of proxy configurations
The tool supports multiple proxy configurations. You can provide them in two ways:
- Single Proxy: Use individual environment variables
- Multiple Proxies: Use the
proxies
JSON array for automatic proxy rotation
The scraper returns job listings in the following format:
{
title: "Job Title",
company: "Company Name",
location: "Job Location",
timeAgo: "Posted time ago",
url: "Full job URL"
}
The tool handles various error scenarios:
- 403/429 Status Codes: Automatically detected as blocking responses
- Network Errors: Graceful handling of connection issues
- Invalid Proxy Configuration: Fallback to direct connection
- Invalid JSON: Safe parsing of proxy configurations
- cheerio: HTML parsing and DOM manipulation
- node-fetch: HTTP client for making requests
- https-proxy-agent: Proxy support for HTTPS requests
- dotenv: Environment variable management
This tool is for educational purposes only. Please ensure you comply with:
- The target website's Terms of Service
- Rate limiting policies
- Applicable laws and regulations
Always use this tool responsibly and respect website policies.
This project is licensed under the MIT License - see the LICENSE file for details.
- 403/429 Errors: Try using a proxy or wait before making more requests
- No jobs found: Check if the job title is valid or try different keywords
- Proxy connection failed: Verify your proxy credentials and configuration
- Environment variables not loading: Ensure your
.env
file is in the root directory
If you encounter any issues:
- Check the console output for error messages
- Verify your environment configuration
- Ensure all dependencies are installed
- Check your internet connection and proxy settings
Happy Job Hunting!