Skip to content

OCD Division IDs in Wikidata #213

@epaulson

Description

@epaulson

This is more of an FYI issue - I posted this to the google group, but James McKinney recommended also creating an issue and tagging the @opencivicdata/division-id-curators (though @jpmckinney, I don't seem to be able to tag that team, maybe you can only do that if you're on the team?) At any rate, I am reproducing that post here


As you might have seen in an earlier note to the list, there is now a property in Wikidata that connects entities to an Open Civic Data Division ID. I was the one who wrote the property proposal and fielded questions there, so I wanted to do some introductions here now that it’s been enabled.

Wikidata, if you don’t know it, is a sibling project to Wikipedia that aims to create a free and open structured knowledge base that’s both machine-readable and easy for humans to read too. Part of it aims to put data that’s found in the Wikipedia infoboxes (facts like city populations, area, etc) into a structured format so that it can be reused across wikis, but it goes much beyond that. Wikidata includes data about many more “things” than Wikipedia does - there are about 90 million entities in Wikidata right now, and growing.

Wikidata is a knowledge graph, using an extension to Mediawiki called ‘Wikibase’. For purposes of this discussion, you can treat Wikibase as a triplestore graph database that captures facts in the form:

e.g. each fact makes a statement about how “subject” and “object” are related via “predicate”. For example:

“madison” isA “City”
“madison” isCapitalOf “Wisconsin”
“madison” hasPopulation 223209
“madison” hasOpenCivicDataDivisionID ocd-division/country:us/state:wi/place:madison

(In actuality Wikidata uses its own set of identifiers for everything in its facts. So, instead of saying “madison” isCapitalOf “Wisconsin”, Wikidata writes

Q43788 P1376 Q1537

and for OCD-ID, it says

Q43788 P8651 ‘ocd-division/country:us/state:wi/place:madison’ )

The interesting change that happened recently was that property P8651 ( https://www.wikidata.org/wiki/Property:P8651 ) was recently created, which now lets folks add OCD-IDs to entities in Wikidata. (They’re added as literals)

Wikidata includes a query language that can be used via a nice UI or over an HTTP endpoint (https://query.wikidata.org/ ), using the SPARQL query language. Here’s how to look up the Wikidata entity for Madison

SELECT ?item ?itemLabel 
WHERE 
{
  ?item wdt:P8651 "ocd-division/country:us/state:wi/place:madison" .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}

(See it in action here - click on the blue run triangle to execute it - https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel%20%0AWHERE%20%0A%7B%0A%20%20%3Fitem%20wdt%3AP8651%20%22ocd-division%2Fcountry%3Aus%2Fstate%3Awi%2Fplace%3Amadison%22%20.%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22%5BAUTO_LANGUAGE%5D%2Cen%22.%20%7D%0A%7D )

I’ve labeled just a handful of US congressional districts with OCD-IDs, you can see them with this query:

SELECT ?item ?itemLabel ?ocd 
WHERE 
{
  ?item wdt:P8651 ?ocd.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}

https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel%20%3Focd%20%0AWHERE%20%0A%7B%0A%20%20%3Fitem%20wdt%3AP8651%20%3Focd.%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22%5BAUTO_LANGUAGE%5D%2Cen%22.%20%7D%0A%7D

So, a couple of things:

First, I think this is really exciting because Wikidata can be the database that OCD lacks. Virtually everything that gets an OCD Division Identifier is notable enough to include in Wikidata, so it can be a database to find out more data about any OCD-ID.

Second, what makes OCD-IDs cool is that they’re the shared database keys across multiple databases, so now you can join in through Wikidata to many more databases and facts. The Google Civic Info API can tell you who the current representatives are for a district, but you can combine that with Wikidata and get the shapefile for the CD or follow the link to the Reddit for that district, or find the population of the state, etc.

Third, Wikidata can be the source of additional entities that probably need OCD-IDs. It’s also possible that there are OCD-IDs for things that aren’t in Wikidata, so both projects can complete each other.

And some caveats:

Wikidata is not going to mint new OCD-IDs. Obviously nothing changes for OCD-ID - if there is something that someone finds in Wikidata that should have an OCD-ID, they should still come to the Github and propose a change with the new identifier. The value of OCD-ID remains the same: everyone who uses OCD-IDs agrees that they’ll use the OCD-IDs as identifiers against their local database, and the value is that the set of identifiers is governed so there is a shared join key.

For now, this property is only for Division identifiers, again because they’re governed. There are formats specified for other identifiers like People and Jurisdictions and others, but as near as I can tell anyone using these types are just minting them themselves for local data use and no one is committing to share them between data sources. (Wikidata can support that use case, but I think it would treat each identifier as a property to a specific data source, like an OpenStates ID vs an OpenElections ID, etc)

There may be some weirdness in the data models between OCD and Wikidata - for example, in Wikidata they’ve decided to separate out some constituencies that might represent an entire district, for example, for Australia they’ve got Queensland the senate constituency ( https://www.wikidata.org/wiki/Q56649111 ) and Queensland the state ( https://www.wikidata.org/wiki/Q36074 ) and they think at some point they may do something similar for the US.

This list is pretty low-traffic so I’m not sure how many people are going to see this. (I’m actually hoping that maybe some folks on the list are interested in working together to put OCD-IDs into Wikidata and maybe that would lead to some more traffic on the list). Looking forward to hearing from folks with questions or ideas!

Metadata

Metadata

Assignees

No one assigned

    Labels

    cooperationRelating to cooperation with other projects, like Wikidata

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions