A machine learning-powered job application tool that helps job seekers find positions where their qualifications best match the requirements. ResumeRadar analyzes resume content against job listings to provide relevance scores and targeted application recommendations.
This open source project aims to reduce resume black hole syndrome by helping candidates focus on opportunities where they're most competitive. The core algorithm uses NLP to compare resume skills and experiences with job posting requirements, calculating match percentages and identifying strength areas.
The job crawler system is a modular, scalable solution for scraping job listings from remote job boards. It uses two different technologies based on the nature of the target website:
- Crawlee (Node.js) for JavaScript-heavy sites with dynamic content
- Scrapy (Python) for static sites with simple HTML structures
- Targets remote-friendly job boards
- Handles both static and dynamic websites
- Implements anti-blocking measures (proxy rotation, user agent rotation, request delays)
- Respects robots.txt for ethical scraping
- Standardized JSON output format
- Comprehensive error handling and logging
/crawlers/
├── crawlee_sites/ # JS-heavy sites (Node.js)
│ ├── base_crawler.js # Base crawler class
│ ├── remoteok.js # RemoteOK crawler
│ └── index.js # Entry point for all Crawlee crawlers
├── scrapy_sites/ # Static sites (Python)
│ ├── job_crawler/ # Scrapy project
│ │ ├── spiders/ # Spider implementations
│ │ │ ├── base_spider.py # Base spider class
│ │ │ └── weworkremotely.py # WeWorkRemotely spider
│ │ ├── items.py # Item definitions
│ │ ├── pipelines.py # Item processing pipelines
│ │ └── settings.py # Scrapy settings
└── utils/ # Shared utilities
├── logger.js # Logging utility
├── proxy_rotator.js # Proxy rotation utility
└── user_agent_rotator.js # User agent rotation utility
- Node.js 14+ and npm
- Python 3.8+ and pip
-
Clone the repository:
git clone https://github.com/yourusername/ResumeRadar.git cd ResumeRadar
-
Install Node.js dependencies:
npm install
-
Install Python dependencies:
pip install -r requirements.txt
To run all crawlers (both Crawlee and Scrapy):
node index.js
To run only the JavaScript-based crawlers:
npm run crawl:all-js
To run a specific Crawlee crawler:
npm run crawl:remoteok
To run a specific Scrapy spider:
cd crawlers/scrapy_sites
python -m scrapy crawl weworkremotely
All crawled job listings are saved as JSON files in the data/output
directory. Each job listing contains the following fields:
{
"job_title": "Senior Software Engineer",
"company": "Tech Corp",
"url": "https://remoteok.com/job/123",
"description": "Build scalable systems...",
"location": "Remote (Global)",
"salary": "$80k–$100k",
"source": "RemoteOK",
"crawled_at": "2023-05-14T12:34:56.789Z"
}
- Add the site configuration to
data/job_boards.json
- Create a new crawler file in
crawlers/crawlee_sites/
- Extend the
BaseCrawler
class and implement therequestHandler
method - Add the new crawler to
crawlers/crawlee_sites/index.js
- Add the site configuration to
data/job_boards.json
- Create a new spider file in
crawlers/scrapy_sites/job_crawler/spiders/
- Extend the
BaseJobSpider
class and implement theparse
method
- Resume parsing and skill extraction
- Job listing data collection and analysis
- Match scoring based on qualification alignment
- Basic recommendation engine for top-matching positions