Skip to content
vjrj edited this page Oct 29, 2019 · 24 revisions

Introduction

Do you find that some institution and/or collection in you collectory don't have the correct occurrencies records associated after an igestion? We try to explain here how to map institution and collections codes in our DwC-A occurrences files to institutions and collections in a collectory service, so correct records number are associated to them in that service.

For this we have to map the institutionCode and collectionCode values in our DwC-A dataset, with a equivalent institutions and collections via ther acronyms.

STEP 1: Looking for the provider codes (institution/collections codes)

So first of all we need to know these code values.

There are several ways to obtain them:

Option 1: using a spreadsheet (solution use on GBIF France portal)

  • Download the DwC-A in local
  • Open the resource file using Excel
  • Add filters to the file
  • Go to institutionCode and collectionCode
  • Click on the column name for knowing the different codes used by this dataset

IMPORTANT : works with resources having a number of occurrences lower than the maximum number of lines in Excel (around 1 million)

Option 2: using the command line

# Imagine that the institutionCode is in the 3rd column and collectionCode in the 4th

export INST_IDX=3
export COLL_IDX=4
echo "Institution codes:"
cat occurrence.txt | tail -n +2 | awk -F"^I" -v a="$INST_IDX" '{print $a}' | sort | uniq
echo "Collection codes:"
cat occurrence.txt | tail -n +2 | awk -F"^I" -v a="$COLL_IDX" '{print $a}' | sort | uniq

The -F indicates the field separator so can be a tab (introduced in bash with Ctlr-v + TAB) a comma or whatever field separartor you are using.

Option 3: looking the logs during indexing

If you have logger warns enabled you can see the codes during a first reindexation, with lines like:

INFO : [DataLoader] - The current institution codes for the data resource

INFO : [DataLoader] - The current collection codes for the data resource

And you can use this codes in the collectory (see below) and do later a reindex.

STEP 2: Creating the mapping (if does not exist yet)

Now you know the institutionCodes and collectionCodes we have to map it in our collectory.

If the specific providerCodes do not exist, enter them using “Manage provider code” page :

https://URL_to_your_LA_portal/providerCode/list

To don't have to use the pagination you can type something like:

https://URL_to_your_LA_portal/providerCode/list?offset=0&max=100

Create the provider map between the data resource, the institution and/or the collection in the following page :

https://URL_to_your_LA_portal/providerMap/list

Same here, you can get a bigger list with:

https://URL_to_your_LA_portal/providerMap/list?sort=id&max=100&offset=0&order=asc

Create the institution and/or collection (if does not exist), and set up the record consumers

Fill in the metadata for the newly created dataresource.

If the dataResource is a link to a collection :

  • if it doesn’t exist : create the collection page.
  • if it does exist : add the collection to the record consumer of this dataResource

Link the dataResource to an institution :

  • if it doesn’t exist : create the institution page by filling in the metadata.
  • if it exists : add the institution to the record consumer of this dataResource.

Don’t forget to go back to the dataResource page to fill in the record consumers if the collection or/and institution doesn’t exist.

Example

Lets use this dataResource as example: https://colecciones.gbif.es/public/show/dr684

Getting the codes:

$ export INST_IDX=5
$ export COLL_IDX=6
$ cat occurrence.txt | tail -n +2 | awk -F"  " -v a="$INST_IDX" '{print $a}' | sort | uniq
IEOCA
$ cat occurrence.txt | tail -n +2 | awk -F"  " -v a="$COLL_IDX" '{print $a}' | sort | uniq
IEO

So in: https://colecciones.gbif.es/providerCode/list?offset=0&max=100 we look for IEOCA and IEO if they don't exist we create it.

Now the provider map: https://colecciones.gbif.es/providerMap/list?sort=id&max=100&offset=0&order=desc

provider map

Later we go to the collectory data resource, in our example: https://colecciones.gbif.es/dataResource/show/dr684

and we set the record consumers

record consumers

Verifications

Your can check if some institution and collection well mapped:

https://collections.ala.org.au/ws/lookup/inst/<institutionCode>/coll/<collectionCode>

For instance: https://colecciones.gbif.es/lookup/inst/IEOCA/coll/IEO

And via the ws, in our example:

https://colecciones.gbif.es/ws/dataResource/dr684

{
  "provider": {
  "name": "GBIF ES IPT",
  "uri": "https://colecciones.gbif.es/ws/dataProvider/dp2",
  "uid": "dp2"
},
"institution": {
  "name": "Instituto Español de Oceanografía. Centro Oceanográfico de Canarias",
  "uri": "https://colecciones.gbif.es/ws/institution/in105",
  "uid": "in105"
},
(...)
"linkedRecordConsumers": [
{
  "name": "Colección de fauna marina del Centro Oceanográfico de Canarias",
  "uri": "https://colecciones.gbif.es/ws/collection/co245",
  "uid": "co245"
},
{
  "name": "Instituto Español de Oceanografía. Centro Oceanográfico de Canarias",
  "uri": "https://colecciones.gbif.es/ws/institution/in105",
  "uid": "in105"
}
],

Also this simple non-official util helps to check data mappings in the collectory.

Clone this wiki locally