This Node.js application uses the Google Custom Search API to retrieve search results based on specified criteria such as keywords, domains (including subdomains), locations, and publication date. The script filters out results containing undesirable phrases and continues paginating until it accumulates the number of unique new results defined by your configuration. These unique results are then appended to a CSV file, and their URLs are stored in a separate file so that duplicates are skipped in future searches.
Note: The API credentials and sensitive configuration details are stored in a
config.json
file, which is excluded from version control via.gitignore
to protect your credentials. Your search results, stored in thegoogle_search_results.csv
file, and your list of processed URLs, stored in theprocessed_urls.txt
file, are also excluded from version control via.gitignore
to protect the privacy of your searches.
- Advanced Query Building:
Combines two sets of search terms:- intitleKeywords: Keywords that must appear in the page title (using the
intitle:
operator). - keywords: General search terms that can appear anywhere in the page.
- intitleKeywords: Keywords that must appear in the page title (using the
- Domain Filtering:
Uses thesite:
operator to restrict results to specified domains (and all of their subdomains). - Location and Date Filtering:
Limits search results based on location criteria and (optionally) a minimum publication date. - Avoid Keywords:
Excludes results containing designated phrases (e.g., "government clearance"). - Dynamic Pagination:
Continues fetching additional pages beyond duplicates until the number of new unique results reaches the configuredmaxResults
value. - Duplicate Tracking:
Stores processed URLs in a file (processed_urls.txt
) to avoid reprocessing duplicates on subsequent searches. - CSV Output:
Appends new unique results (Title and URL) togoogle_search_results.csv
instead of overwriting them. - User Configurable:
All search settings are managed through a single configuration file.
- Node.js (version 10 or later recommended)
- A Google Custom Search API key and a Custom Search Engine (CSE) ID
Clone the repository to your local machine:
git clone https://github.com/thejessicafelts/job-seeker.git
cd job-seeker
Install the required Node.js dependencies in your project directory:
npm install node-fetch@2
Create a file named config.json
in the root of your repository. This file should contain your search criteria and API credentials. A sample configuration file is provided in the repository as sampleConfig.json
; you should rename it to config.json
and update the values as needed. A sample file for processed URLs is also provided in the repository as sampleProcessedUrls.txt
; you should rename it to processed_urls.txt
.
Below is an example configuration:
{
"intitleKeywords": ["frontend developer"],
"keywords": ["experienced", "senior"],
"avoidKeywords": ["government clearance"],
"minDate": ["2025-01-01"],
"domains": ["workday.com", "icims.com"],
"locations": ["Remote", "USA"],
"maxResults": 10,
"apiKey": "YOUR_API_KEY",
"cx": "YOUR_CSE_ID"
}
IMPORTANT:
- DO NOT commit
config.json
to your repository. The.gitignore
file in this repository already excludesconfig.json
to protect your sensitive information. - Replace
"YOUR_API_KEY"
and"YOUR_CSE_ID
with your actual credientials (see next section for instructions on how to set these up).
Getting Your API Key:
- Visit the Google Developers Console
- Create a new project or select an existing one.
- Navigate to APIs & Services > Library.
- Search for Custom Search API and enable it.
- Go to APIs & Services > Credentials and click Create Credentials > API Key.
- Copy your new API key and paste it into the
config.json
file under the"apiKey"
field.
Creating a Custom Search Engine (CSE) and Obtaining the CSE ID:
- Go to the Google Programmable Search Engine.
- Click on "Add" or "New Search Engine".
- In the "Sites to search" select "Search the entire web", since this script allows you to provide your own list of URLs.
- Provide a name for your search engine and click "Create".
- Once created, go to the Control Panel of your new search engine.
- Look for the "Search engine ID" (also known as the
cx
parameter). - Copy the CSE ID and paste it into the
config.json
file under the"cx"
field.
Once the configuration is complete, run the script with:
node googleSearch.js
The script will:
- Build the search query based on your critera.
- Use the Google Custom Search API to fetch results, skipping duplicates, until
"maxResults"
new unique results are accumulated. - Append new unique results to the
google_search_results.csv
file. - Store the URLs of processed pages in
processed_urls.txt
so that subsequent runs do not process duplicates.
Feel free to adjust the parameters in config.json
to meet your specific search criteria. The main script (googleSearch.js
) is modular and uses clearly defined functions, making it straightforward to modify the query logic or output processing as needed.