- GitHub: https://github.com/jic-dtool/dtool-lookup-server-annotation-filter-plugin
- PyPI: https://pypi.python.org/pypi/dtool-lookup-server-annotation-filter-plugin
- Free software: MIT License
This dtool-lookup-server plugin adds the ability to get an overview of the dataset a user has got access to based on how those datasets have been annotated with key/value pairs.
The purpose of this API is to give users an overview of all the datasets available to them and to allow them to drill down on those results by filtering based upon keys and key/value pairs.
This API could be used to build a webapp that allows users to get an "eagle-eye" view of their data.
This plugin depends on having installed and configured a dtool-lookup-server. This plugin can then be installed by running the commands below.
git clone https://github.com/jic-dtool/dtool-lookup-server-annotation-filter-plugin.git cd dtool-lookup-server-annotation-filter-plugin python setup.py install
See dtool-lookup-server for more information about the setup of the base system.
This plugin has five routes.
- POST /annotation_filter_plugin/annotation_keys
- POST /annotation_filter_plugin/annotation_values
- POST /annotation_filter_plugin/num_datasets
- POST /annotation_filter_plugin/datasets
- GET /annotation_filter_plugin/version
The first gives access to all annotations keys that have are present on at least one dataset with a basic value. The keys will only be extracted from datasets that pass any annotation filter in the post request. The response from this route includes information about the number of datasets associated with each key.
The second gives access to all values for the keys specified in the post request. The values will only be extracted from the datasets that pass the annotation filter in the post request. The response form this route includes information about the number of datasets associated with each key/value pair.
The third gives the number of datasets given a particular annotation filter.
The fourth gives the list of datasets given a particular annotation filter.
The fifth returns the version of the plugin.
Below are examples of JSON queries that can be posted to the routes.
No filters, i.e. get all (this only really makes sense for the /annotation_filter_plugin/annotation_keys route).
{}
Get only datasets that have the key "color":
{ "annotation_keys": ["color"] }
Get only datasets that have the "color" is set to "red":
{ "annotations": {"color": "red"} }
Get only datasets that have both the keys "color" and "pattern":
{ "annotation_keys": ["color", "pattern"] }
Get only datasets that have the "color" is set to "red" and "pattern" set to "stripey":
{ "annotations": {"color": "red", "pattern": "stripey"} }
Get only datasets that have the keys "color" and "pattern" and where the "color" is set to "red":
{ "annotation_keys": ["color", "pattern"], "annotations": {"color": "red"} }
- This plugin only recognises annotations where the value is a basic type, such as a string, a number or a boolean value. In other words a dataset's annotations where the value is a data structures such as lists and dictionaries will be ignored.
- Datasets that do not have any annotation with a basic type as a value will not be recognised up by this plugin.
The dtool lookup server makes use of the Authorization header to pass through the
JSON web token for authorization. Below we create environment variables for the
token and the header used in the curl
commands:
$ TOKEN=$(flask user token olssont) $ HEADER="Authorization: Bearer $TOKEN"
The command below finds all annotations keys available for further filtering:
$ curl -H "$HEADER" -H "Content-Type: application/json" \ -X POST -d '{}' \ http://localhost:5000/annotation_filter_plugin/annotation_keys
The response below means that the annotation key "color" has 120 datasets associated with it and the annotation key "pattern" has 50 datasets associated with it.
{"color": 120, "pattern": 50, "size": 10}
Suppose that one chooses to filter further based on the "pattern" annotation key. Using the command below one could find the annotation keys that are still relevant given that each dataset has to have the annotation key "pattern".
$ curl -H "$HEADER" -H "Content-Type: application/json" \ -X POST -d '{"annotation_keys": ["pattern"]}' \ http://localhost:5000/annotation_filter_plugin/annotation_keys
The response below shows that no datasets that remain have the key "size" and 45 of the datasets with the key "pattern" also have the key "color".
{"color": 45, "pattern": 50}
It is possible to filter based on an annotation key/value pair. For example, to limit the datasets to the case where the "pattern" is "stripey" one could use the command below.
$ curl -H "$HEADER" -H "Content-Type: application/json" \ -X POST -d '{"annotations": {"pattern": "stripey"}}' \ http://localhost:5000/annotation_filter_plugin/annotation_keys
The response below shows that this is more specific and that there are fewer results.
{"color": 5, "pattern": 10}
It is possible to make more complex queries. The command below also requires that the datasets have the key "color".
$ curl -H "$HEADER" -H "Content-Type: application/json" \ -X POST -d '{"annotation_keys": ["color"], "annotations": {"pattern": "stripey"}}' \ http://localhost:5000/annotation_filter_plugin/annotation_keys
In the response below there are now fewer datasets with the "pattern" key. That is because some of the datasets that were picked up previously did not have the "color" key.
{"color": 5, "pattern": 3}
It is also possible to filter using base URIs. The command below limits the keys to those from the base URIs "s3://snow-white" and "s3://mr-men":
$ curl -H "$HEADER" -H "Content-Type: application/json" \ -X POST -d '{"base_uris": ["s3://snow-white", "s3://mr-men"]}' \ http://localhost:5000/annotation_filter_plugin/annotation_keys
The response below shows that there are fewer hits than when all base URIs are included.
{"color": 77, "pattern": 35, "size": 4}
The pattern for finding annotation key/value pairs and the number of datasets assocated with them is similar to that of finding the keys (above).
The command below can be used to find all the values associated with the "color" key and the number of datasets that has been annotated with each particular value.
$ curl -H "$HEADER" -H "Content-Type: application/json" \ -X POST -d '{"annotation_keys": ["color"]}' \ http://localhost:5000/annotation_filter_plugin/annotation_values
The response below shows that there are five colors available and that most datasets have the color "red".
{ "color": { "red": 50, "pink": 30, "blue": 20, "green": 15, "yellow": 5 } }
To get data for more keys they need to be included in the filter. The command below returns the datasets that have annotations for both "color" and "pattern".
$ curl -H "$HEADER" -H "Content-Type: application/json" \ -X POST -d '{"annotation_keys": ["color", "pattern"]}' \ http://localhost:5000/annotation_filter_plugin/annotation_values
The response contains less colors because some of the datasets annotated with color did not have a pattern annotation.
{ "color": { "red": 15, "pink": 10, "blue": 10, "green": 10 } "pattern": { "stripey": 40, "wavy": 10 }
It is possible to make more specific queries. The command below also requires that the datasets have the stripey pattern.
$ curl -H "$HEADER" -H "Content-Type: application/json" \ -X POST -d '{"annotation_keys": ["color"], "annotations": {"pattern": "stripey"}}' \ http://localhost:5000/annotation_filter_plugin/annotation_keys
The response below shows that fewer datasets have been used to collect the annotation information.
{ "color": { "red": 15, "pink": 10, "blue": 10, "green": 5 } "pattern": { "stripey": 40, }
It is also possible to filter using base URIs. The command below limits the keys to those from the base URIs "s3://snow-white" and "s3://mr-men":
$ curl -H "$HEADER" -H "Content-Type: application/json" \ -X POST -d '{"annotation_keys": ["color"], "base_uris": ["s3://snow-white", "s3://mr-men"]}' \ http://localhost:5000/annotation_filter_plugin/annotation_keys
The response below shows that there are fewer hits than when all base URIs are included.
{ "color": { "red": 50, "pink": 20, "blue": 7, } }
The number of datasets selected, using a particular filter, can be determined using the
/annotation_filter_plugin/num_datasets
route. The command below selects all datasets
with at least one basic value (see the section below on limitations for an explanation
of what a basic value is).
$ curl -H "$HEADER" -H "Content-Type: application/json" \ -X POST -d '{}' \ http://localhost:5000/annotation_filter_plugin/num_datasets
The response below shows that there are 145 such datasets.
145
The command below uses a filter to select only datasets that have the key/value pair "pattern"/"stripey".
$ curl -H "$HEADER" -H "Content-Type: application/json" \ -X POST -d '{"annotations": {"pattern": "stripey"}}' \ http://localhost:5000/annotation_filter_plugin/num_datasets
The response shows that there are 10 such datasets.
10
It is possible to get information about the datasets selected by a particular
filter using the /annotation_filter_plugin/datasets
route. The command
below uses a filter to select only datasets that have the key/value pair
"pattern"/"stripey".
$ curl -H "$HEADER" -H "Content-Type: application/json" \ -X POST -d '{"annotations": {"pattern": "stripey"}}' \ http://localhost:5000/annotation_filter_plugin/datasets
Below is a truncated version of the response.
[ { "annotations": { "pattern": "stripey }, "base_uri": "s3://dtool-demo", "created_at": "1530803916.74", "creator_username": "olssont", "dtoolcore_version": "3.3.0", "frozen_at": "1536749825.85", "name": "hypocotyl3", "type": "dataset", "uri": "s3://dtool-demo/ba92a5fa-d3b4-4f10-bcb9-947f62e652db", "uuid": "ba92a5fa-d3b4-4f10-bcb9-947f62e652db" } ... ]