Skip to content

URN/URL strategy #4

@pvgenuchten

Description

@pvgenuchten

A typical succesfactor in the world of web-crawling is a good urn-strategy.
Crawlers ingest a resource and identify it with some urn/url.
In theory the urn can be http://{ogc-proxy}/?url={ogc-request-as-get}, but it would be beneficial to simplify that structure, because a lot can happen with such a url (special characters, it's dynamic nature, etc).

A structure like this would be much better

http://{wfs-proxy}/wfs/{server-id}/ (to get a list of featuretypes/getcapabilities)
http://{wfs-proxy}/wfs/{server-id}/{featuretype}/{page} (to get a paginated list of features)
http://{wfs-proxy}/wfs/{server-id}/{featuretype}/feature/{recordid} (to get a feature)

Which means the proxy should have some persistence of server id's, which may require the wfs-proxy to advertise some methods to register a wfs server (and/or retrieve it from a coupled catalog).

Optional parameters on the url (query string) are:

  • outputformat (json, gml, kml, shape, geopackage, rdf/xml, jsonld, turtle) or this can be managed with an accept-header
  • projection (relevant if gml/shape/geopackage)
  • filter by a geographic area or any of the attribute fields

URN vs URL, from a search engine point of view the urn should be the same as the url, in semantic web a urn doesn't need to resolve to a resource, however a search engine will never crawl a resource if the urn is not a resolvable url.

On the other hand it is awkward that we facilitate wfs content to be crawled, but assign it a URL which is outside the domain where the resource is stored. Maybe this can be resolved over time by those domains to install their own WFS proxy.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions