Python script to monitor websites
In its current form Monitor
counts number of occurances of a chosen set of keywords on sites and compares these to the number of occurances last time the program was run. If the number of occurances of a given keyword increased since last time, then it is reported in the log. The program also looks if new external links appear on the sites and report so in the log.
To monitor a bunch of sites one needs to create a workspace, a folder henceforth denoted workspace
. All cached data is stored in workspace
. Within workspace
one needs to create a folder named "inputFolder" containing two plain text files:
- A file named "urlFile" containing a list of urls to monitor: A single url on each line.
E.g. the content of
urlFile
might look something likeif these were the sites one wishes to monitor.https://github.com https://www.worldometers.info
- A file named "keyFile". This file should contain a list of keywords to look for on the sites: A
single word on each line. The case of the words doen't matter. E.g. the content of
keyFile
might look something likeif these were the key words one would like to look for in the sites.application job news
Once a proper workspace has been setup, run
python monitor.py <path to workspace>
to cache the needed data. Whenever one would like to compare the current sites with the cached sites the same command is run. This also results in the cached data being updated.
Run
pip install requirements.txt
to install all dependencies.