Skip to content

abhiabhi94/job-board

Repository files navigation

Introduction

A job board, this was initially built out of boring job hunting. This basically scans through different portals and based upon the preferences like keywords, salary, etc., and brings the jobs that match your preference.

Portals Integrated

Configurations

Most configurations can be set through a .env file. All configurations can be found in job_board/config.py file.

API

The API endpoint is /.json. All filters that are available on the UI are also available on the JSON.

CLI

  • Most options should be available using the --help flag.
job-board --help
  • Running the webserver in debug mode.
job-board runserver -d
  • Fetching the jobs immediately
job-board fetch
  • Run it for only specific portals(include these portals)
job-board fetch -I weworkremotely -I python_dot_org
  • Run it for all portals, but exclude some(maybe the portal is down, etc)
job-board fetch -E wellfound -E work_at_a_startup
  • Start the job scheduler (runs jobs according to their cron schedules)
job-board scheduler start
  • List all registered scheduled jobs
job-board scheduler list-jobs
  • Run a specific job manually
job-board scheduler run-job fetch_jobs_daily
  • Remove all scheduled jobs (useful before deployment)
job-board scheduler remove-jobs

Tests

Python Tests

ENV=test pytest

JavaScript Tests

# Install JavaScript dependencies first
npm install

# Run JavaScript tests
npm run test:run

# Run with coverage
npm run test:coverage

Contributing

Installing development version

pip install -e ".[dev]"
pre-commit install

CSS Development

  • Development: Uses Tailwind CDN (set ENV=dev)
  • Production: Uses optimized local CSS
  • Pre-commit hooks auto-build CSS when templates change
  • Manual build: bash scripts/build-tailwind.sh

Integrations Per Portal

The below text is mostly written as a note to future me. In hope, that it helps to debug in case of an issue.

  • Although, they have a public RSS feed, for some reason they seem to be using some sort of cloudfare protection that is blocking HTTP requests from scripts.

  • So scrapfly is used to bypass it.

  • Although, they have API for fetching jobs, the data is pretty unstructured.
  • They have a public RSS feed, so the integration is mostly straightforward.
  • They have a public API, so the integration is straightforward.
  • They have special mechanisms setup to stop scripts from scraping their website.
  • So scrapfly along with its ASP(Anti Scraping Protection) feature is used to bypass them.
    • Although this works, it makes the whole integration very slow since it takes close to 50 - 200 seconds to scrape a single page.
    • Total pages to scrape maybe around 20 - 40.
    • So yeah, a better alternative that reliably works faster is welcome.
  • They don't show all jobs unless you're logged in to their profile.
  • For now, the browser cookies(after logging in) are used to make requests and scrape.
    • These cookies seem to be long-lasting(haven't needed to change them even once since this was implemented.)

TODO

  • Add filtering as per location, nowadays remote doesn't actually mean remote. Some job descriptions say remote India, remote USA etc.

About

A python based web app that fetches jobs from different portals and allows users to filter them

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •