Skip to content

diogo464/portugal-running-data

Repository files navigation

portugal-running-data

repo with scraper for the portugal running calendar data

Filename Source Script Optional Description
lastmod setup-directories no last modification time extracted from the sitemap file
page.html fetch-page no event page from portugalrunning.com
id extract-id no event numeric id from wordpress
data.json fetch-data no json file with some event data
ics fetch-ics no calendar file with location, date and other event information
location fetch-location yes location data for the event
image fetch-image yes cover image for the event
date extract-date no event date extracted from the ics file
oneline-description fetch-oneline-description yes ai generated one line description
categories extract-categories no event categories
circuits extract-circuits no event circuits

fetch-sitemap

this script fetches the sitemap that contains a list of event page urls and the last modification date

fetch-pages

this script will fetch any missing pages or outdated pages by looking at the lastmod file.

extract-ids

this script will extract the event ids from the page.html file. this id can be used to later fetch other data related to this event.

fetch-ics

this script uses the event id and fetches its ics file.

fetch-data

this script uses the event id to fetch some event data in json format.

fetch-images

some events have a main image in the json data file, this script will fetch that image.

extract-organizer

this script extracts the organizer from the class list in the json data file, if one exists.

extract-categories

this script extracts a list of categories from the class list in the json data file.

Releases

No releases published

Packages

No packages published

Languages