-
Notifications
You must be signed in to change notification settings - Fork 0
Description
A typical succesfactor in the world of web-crawling is a good urn-strategy.
Crawlers ingest a resource and identify it with some urn/url.
In theory the urn can be http://{ogc-proxy}/?url={ogc-request-as-get}, but it would be beneficial to simplify that structure, because a lot can happen with such a url (special characters, it's dynamic nature, etc).
A structure like this would be much better
http://{wfs-proxy}/wfs/{server-id}/ (to get a list of featuretypes/getcapabilities)
http://{wfs-proxy}/wfs/{server-id}/{featuretype}/{page} (to get a paginated list of features)
http://{wfs-proxy}/wfs/{server-id}/{featuretype}/feature/{recordid} (to get a feature)
Which means the proxy should have some persistence of server id's, which may require the wfs-proxy to advertise some methods to register a wfs server (and/or retrieve it from a coupled catalog).
Optional parameters on the url (query string) are:
- outputformat (json, gml, kml, shape, geopackage, rdf/xml, jsonld, turtle) or this can be managed with an accept-header
- projection (relevant if gml/shape/geopackage)
- filter by a geographic area or any of the attribute fields
URN vs URL, from a search engine point of view the urn should be the same as the url, in semantic web a urn doesn't need to resolve to a resource, however a search engine will never crawl a resource if the urn is not a resolvable url.
On the other hand it is awkward that we facilitate wfs content to be crawled, but assign it a URL which is outside the domain where the resource is stored. Maybe this can be resolved over time by those domains to install their own WFS proxy.