Finding aids "v2" #1967
Replies: 11 comments 6 replies
-
Update: All the
|
Beta Was this translation helpful? Give feedback.
-
Admin data and postal data (CSV archives):
|
Beta Was this translation helpful? Give feedback.
-
A working implementation of a finding aid "reader":
Where Under the hood the code is using a URI template to construct an HTTP "reader" using the corresponding repository's absolute URL for the https://github.com/whosonfirst/go-reader-findingaid/blob/main/findingaid.go#L86-L150 Because URI templates are used to define new go-reader instances the finding aid reader is not limited to HTTP retrievals but can use any reader packages have been imported: |
Beta Was this translation helpful? Give feedback.
-
Admin and postal code finding aid data is now available here: |
Beta Was this translation helpful? Give feedback.
-
How big would they be? Either way I think |
Beta Was this translation helpful? Give feedback.
-
This is all still in branches but:
|
Beta Was this translation helpful? Give feedback.
-
Ballpark costs for DynamoDB setup assuming:
|
Beta Was this translation helpful? Give feedback.
-
This is a thing that works:
And when I visit
|
Beta Was this translation helpful? Give feedback.
-
I am uncertain why this is being terminated by Docker, locally, otherwise it works:
I will wire this in to the WOF ECS tasks to run every (n) hours shortly. |
Beta Was this translation helpful? Give feedback.
-
This was performed by an ECS task. Still need to sort out GH credentials for automated tasks. |
Beta Was this translation helpful? Give feedback.
-
This was performed by an ECS task pulling its GH credentials from an AWS parameter store and saving work to GH as a dedicated This only updates the CSV files in the |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Paging @nvkelso @stepps00 @missinglink @tomtaylor @vicchi for comments. Not required, but welcome.
I am working through a fresh take on the WOF "finding aid" model. To recap, briefly a WOF finding aid is meant to map a given ID to its corresponding
whosonfirst-data
repository.The use case is something like the go-whosonfirst-browser which doesn't have a database of IDs but instead uses one or more go-reader instances to retrieve records. That is: The
go-whosonfirst-browser
doesn't actually know anything about where the data is coming from. It lets the "reader" handle all those details.(Remember: The
go-whosonfirst-browser
does not have the search functionality of something like the Spelunker, but it primarily a tool for rendering any given known ID in a number of formats.)One goal with the finding aids has been to create a "finding aid reader" that when given an ID would look up its corresponding repository and fetch the data over the wire from GitHub. That way the
go-whosonfirst-browser
could run with a minimal footprint (read: No database with a bazillion WOF records).Version "1" of the finding aid code stored finding aids as blobs of JSON in an S3 using the similar URI/naming conventions as WOF records.
Version "2" of the finding aid code aims to move away from this model and instead publish pre-compiled indices that can be stored in a
whosonfirst-data
repository. These files would then be downloaded and indexed according to application-specific rules.The source code (WIP) is but keep in mind it is lacking proper documentation right now:
https://github.com/whosonfirst/go-whosonfirst-findingaid/tree/v2
For example, this is me creating a CSV finding aid for the
sfomuseum-data-maps
repo, fetching the data directly from GitHub:Data is processed using the whosonfirst/go-whosonfirst-iterate package which means that it has the ability to filter records to be included (or excluded) using property filters. By default finding aids are assumed to contain "all the pointers" but this allows purpose-fit finding aids to be created. For example a finding aid for only records of a given placetype.
Iterators are separate from source "providers". The former iterate over records in a given source; the latter generates a list of sources to iterate over.
The finding aid model has two "tables". One is to store the WOF ID lookup and looks like this:
And one to store the repo ID and it's corresponding name:
The idea being that storing string repo names for every record is a waste of space and processing time. Although it may probably be the case that any given finding aids will map to a single WOF repo it is possible for a finding aid to contain pointers to records from multiple repositories.
As of this writing there are three different pre-compiled indices:
whosonfirst-data-admin-
repos is, uncompressed, 77MB.catalog.csv
andsources.csv
. These are faster to create than the SQLite databases and much smaller. I am still doing an initial run of CSV archives for thewhosonfirst-data-admin
repositories but if the SQLite database for China is 10MB the CSV archive is only 1.8MB.Right now, I am inclined to:
csv2sql
tool for populating a local SQLite database from (n) CSV archives but I haven't done timings yet.It would be easy enough to create a
whosonfirst-data/findingaid
repo but I am wondering whether it makes sense to store them in this repo (whosonfirst-data/whosonfirst-data
) ?Thoughts?
Beta Was this translation helpful? Give feedback.
All reactions