-
Notifications
You must be signed in to change notification settings - Fork 2
Data mappings
Do you find that some institution and/or collection in you collectory don't have the correct occurrencies records associated after an igestion? We try to explain here how to map institution and collections codes in our DwC-A occurrences files to institutions and collections in a collectory service, so correct records number are associated to them in that service.
For this we have to map the institutionCode
and collectionCode
values in our DwC-A dataset, with a equivalent institutions and collections via ther acronyms.
So first of all we need to know these code values.
There are several ways to obtain them:
- Download the DwC-A in local
- Open the resource file using Excel
- Add filters to the file
- Go to
institutionCode
andcollectionCode
- Click on the column name for knowing the different codes used by this dataset
IMPORTANT : works with resources having a number of occurrences lower than the maximum number of lines in Excel (around 1 million)
# Imagine that the institutionCode is in the 3rd column and collectionCode in the 4th
export INST_IDX=3
export COLL_IDX=4
echo "Institution codes:"
cat occurrence.txt | tail -n +2 | awk -F"^I" -v a="$INST_IDX" '{print $a}' | sort | uniq
echo "Collection codes:"
cat occurrence.txt | tail -n +2 | awk -F"^I" -v a="$COLL_IDX" '{print $a}' | sort | uniq
The -F
indicates the field separator so can be a tab (introduced in bash
with Ctlr-v + TAB
) a comma or whatever field separartor you are using.
If you have logger warns enabled you can see the codes during a first reindexation, with lines like:
INFO : [DataLoader] - The current institution codes for the data resource
INFO : [DataLoader] - The current collection codes for the data resource
And you can use this codes in the collectory (see below) and do later a reindex.
Now you know the institutionCode
s and collectionCode
s we have to map it in our collectory.
If the specific providerCodes do not exist, enter them using “Manage provider code” page :
https://URL_to_your_LA_portal/providerCode/list
To don't have to use the pagination you can type something like:
https://URL_to_your_LA_portal/providerCode/list?offset=0&max=100
Create the provider map between the data resource, the institution and/or the collection in the following page :
https://URL_to_your_LA_portal/providerMap/list
Same here, you can get a bigger list with:
https://URL_to_your_LA_portal/providerMap/list?sort=id&max=100&offset=0&order=asc
Fill in the metadata for the newly created dataresource.
If the dataResource is a link to a collection :
- if it doesn’t exist : create the collection page.
- if it does exist : add the collection to the record consumer of this dataResource
Link the dataResource to an institution :
- if it doesn’t exist : create the institution page by filling in the metadata.
- if it exists : add the institution to the record consumer of this dataResource.
Don’t forget to go back to the dataResource page to fill in the record consumers if the collection or/and institution doesn’t exist.
Lets use this dataResource as example: https://colecciones.gbif.es/public/show/dr684
Getting the codes:
$ export INST_IDX=5
$ export COLL_IDX=6
$ cat occurrence.txt | tail -n +2 | awk -F" " -v a="$INST_IDX" '{print $a}' | sort | uniq
IEOCA
$ cat occurrence.txt | tail -n +2 | awk -F" " -v a="$COLL_IDX" '{print $a}' | sort | uniq
IEO
So in:
https://colecciones.gbif.es/providerCode/list?offset=0&max=100
we look for IEOCA
and IEO
if they don't exist we create it.
Now the provider map: https://colecciones.gbif.es/providerMap/list?sort=id&max=100&offset=0&order=desc
Later we go to the collectory data resource, in our example: https://colecciones.gbif.es/dataResource/show/dr684
and we set the record consumers
Your can check if some institution and collection well mapped:
https://collections.ala.org.au/ws/lookup/inst/<institutionCode>/coll/<collectionCode>
For instance: https://colecciones.gbif.es/lookup/inst/IEOCA/coll/IEO
And via the ws, in our example:
https://colecciones.gbif.es/ws/dataResource/dr684
{
"provider": {
"name": "GBIF ES IPT",
"uri": "https://colecciones.gbif.es/ws/dataProvider/dp2",
"uid": "dp2"
},
"institution": {
"name": "Instituto Español de Oceanografía. Centro Oceanográfico de Canarias",
"uri": "https://colecciones.gbif.es/ws/institution/in105",
"uid": "in105"
},
(...)
"linkedRecordConsumers": [
{
"name": "Colección de fauna marina del Centro Oceanográfico de Canarias",
"uri": "https://colecciones.gbif.es/ws/collection/co245",
"uid": "co245"
},
{
"name": "Instituto Español de Oceanografía. Centro Oceanográfico de Canarias",
"uri": "https://colecciones.gbif.es/ws/institution/in105",
"uid": "in105"
}
],
Also this simple non-official util helps to check data mappings in the collectory.
Index
- Wiki home
- Community
- Getting Started
- Support
- Portals in production
- ALA modules
- Demonstration portal
- Data management in ALA Architecture
- DataHub
- Customization
- Internationalization (i18n)
- Administration system
- Contribution to main project
- Study case