A job board, this was initially built out of boring job hunting. This basically scans through different portals and based upon the preferences like keywords, salary, etc., and brings the jobs that match your preference.
Most configurations can be set through a .env
file. All configurations
can be found in job_board/config.py
file.
The API endpoint is /.json
.
All filters that are available on the UI are also available on the JSON.
- Most options should be available using the
--help
flag.
job-board --help
- Running the webserver in debug mode.
job-board runserver -d
- Fetching the jobs immediately
job-board fetch
- Run it for only specific portals(include these portals)
job-board fetch -I weworkremotely -I python_dot_org
- Run it for all portals, but exclude some(maybe the portal is down, etc)
job-board fetch -E wellfound -E work_at_a_startup
- Start the job scheduler (runs jobs according to their cron schedules)
job-board scheduler start
- List all registered scheduled jobs
job-board scheduler list-jobs
- Run a specific job manually
job-board scheduler run-job fetch_jobs_daily
- Remove all scheduled jobs (useful before deployment)
job-board scheduler remove-jobs
ENV=test pytest
# Install JavaScript dependencies first
npm install
# Run JavaScript tests
npm run test:run
# Run with coverage
npm run test:coverage
- Please use global
gitignore
, rather than adding agitignore
to the repository. A writeup illustrating the reasoning behind this decision: https://sebastiandedeyne.com/setting-up-a-global-gitignore-file/
pip install -e ".[dev]"
pre-commit install
- Development: Uses Tailwind CDN (set
ENV=dev
) - Production: Uses optimized local CSS
- Pre-commit hooks auto-build CSS when templates change
- Manual build:
bash scripts/build-tailwind.sh
The below text is mostly written as a note to future me. In hope, that it helps to debug in case of an issue.
-
Although, they have a public RSS feed, for some reason they seem to be using some sort of cloudfare protection that is blocking HTTP requests from scripts.
-
So scrapfly is used to bypass it.
- Although, they have API for fetching jobs, the data is pretty unstructured.
- They have a public RSS feed, so the integration is mostly straightforward.
- They have a public API, so the integration is straightforward.
- They have special mechanisms setup to stop scripts from scraping their website.
- So scrapfly along with its ASP(Anti Scraping Protection)
feature is used to bypass them.
- Although this works, it makes the whole integration very slow since it takes close to 50 - 200 seconds to scrape a single page.
- Total pages to scrape maybe around 20 - 40.
- So yeah, a better alternative that reliably works faster is welcome.
- They don't show all jobs unless you're logged in to their profile.
- For now, the browser cookies(after logging in) are used
to make requests and scrape.
- These cookies seem to be long-lasting(haven't needed to change them even once since this was implemented.)
- Add filtering as per location, nowadays remote doesn't actually mean remote. Some job descriptions say remote India, remote USA etc.