-
Notifications
You must be signed in to change notification settings - Fork 2
Data mappings
Do you find that some institution and/or collection in you collectory don't have the correct occurrences records associated after an ingestion? We try to explain here how to map institution and collections codes in our DwC-A occurrences files to institutions and collections in a collectory service, so correct records number are associated to them in that service.
For this we have to map the institutionCode
and collectionCode
values in our DwC-A dataset, with a equivalent institutions and collections via ther acronyms.
So first of all we need to know these code values.
There are several ways to obtain them:
- Download the DwC-A in local
- Open the resource file using Excel
- Add filters to the file
- Go to
institutionCode
andcollectionCode
- Click on the column name for knowing the different codes used by this dataset
IMPORTANT : works with resources having a number of occurrences lower than the maximum number of lines in Excel (around 1 million)
# Imagine that the institutionCode is in the 3rd column and collectionCode in the 4th
export INST_IDX=3
export COLL_IDX=4
echo "Institution codes:"
cat occurrence.txt | tail -n +2 | awk -F" " -v a="$INST_IDX" '{print $a}' | sort | uniq
echo "Collection codes:"
cat occurrence.txt | tail -n +2 | awk -F" " -v a="$COLL_IDX" '{print $a}' | sort | uniq
The -F
indicates the field separator so can be a TAB
(introduced in bash
with Ctlr-v + TAB
) a comma or whatever field separator you are using.
If you have logger warns enabled you can see the codes during a first re-index, with lines like:
INFO : [DataLoader] - The current institution codes for the data resource
INFO : [DataLoader] - The current collection codes for the data resource
And you can use this codes in the collectory (see below) and do later a reindex.
Now you know the several institutionCode
and collectionCode
we have to map them in our collectory.
If the specific provider codes do not exist, enter them using “Manage provider code” page:
https://URL_to_your_LA_portal/providerCode/list
To don't have to use the pagination you can type something like:
https://URL_to_your_LA_portal/providerCode/list?offset=0&max=100
Create the provider map between the data resource, the institution and/or the collection in the following page:
https://URL_to_your_LA_portal/providerMap/list
Same here, you can get a bigger list with:
https://URL_to_your_LA_portal/providerMap/list?sort=id&max=100&offset=0&order=asc
STEP 3: Create the institution and/or collection (if does not exist), and set up the record consumers in the dataResource
Fill in the metadata for the newly created dataResource.
If the dataResource is a link to a collection :
- if it doesn’t exist : create the collection page.
- if it does exist : add the collection to the record consumer of this dataResource
Link the dataResource to an institution :
- if it doesn’t exist : create the institution page by filling in the metadata.
- if it exists : add the institution to the record consumer of this dataResource.
Don’t forget to go back to the dataResource page to fill in the record consumers if the collection or/and institution doesn’t exist.
Lets use this dataResource as example:
https://colecciones.gbif.es/public/show/dr684
Getting the codes:
$ export INST_IDX=5
$ export COLL_IDX=6
$ cat occurrence.txt | tail -n +2 | awk -F" " -v a="$INST_IDX" '{print $a}' | sort | uniq
IEOCA
$ cat occurrence.txt | tail -n +2 | awk -F" " -v a="$COLL_IDX" '{print $a}' | sort | uniq
IEO
So in:
https://colecciones.gbif.es/providerCode/list?offset=0&max=100
we look for IEOCA
and IEO
if they don't exist we create it.
Now the provider map:
https://colecciones.gbif.es/providerMap/list?sort=id&max=100&offset=0&order=desc
Later we go to the collectory data resource, in our example:
https://colecciones.gbif.es/dataResource/show/dr684
and we set the record consumers
Your can check if some institution and collection well mapped (*):
https://collections.ala.org.au/ws/lookup/inst/<institutionCode>/coll/<collectionCode>
For instance:
https://colecciones.gbif.es/lookup/inst/IEOCA/coll/IEO
And via the ws check the linkedRecordConsumers
, in our example:
https://colecciones.gbif.es/ws/dataResource/dr684
And recordsProviderMapping
in:
https://colecciones.gbif.es/ws/collection/co245
{
"provider": {
"name": "GBIF ES IPT",
"uri": "https://colecciones.gbif.es/ws/dataProvider/dp2",
"uid": "dp2"
},
"institution": {
"name": "Instituto Español de Oceanografía. Centro Oceanográfico de Canarias",
"uri": "https://colecciones.gbif.es/ws/institution/in105",
"uid": "in105"
},
"linkedRecordConsumers": [
{
"name": "Colección de fauna marina del Centro Oceanográfico de Canarias",
"uri": "https://colecciones.gbif.es/ws/collection/co245",
"uid": "co245"
},
{
"name": "Instituto Español de Oceanografía. Centro Oceanográfico de Canarias",
"uri": "https://colecciones.gbif.es/ws/institution/in105",
"uid": "in105"
}
],
Also this simple non-official util helps to check data mappings in the collectory.
(*) Note: If you use chrome, check this extension or similar to visualize well json
pages. Firefox do this well out of the box.
Index
- Wiki home
- Community
- Getting Started
- Support
- Portals in production
- ALA modules
- Demonstration portal
- Data management in ALA Architecture
- DataHub
- Customization
- Internationalization (i18n)
- Administration system
- Contribution to main project
- Study case