This Python client is designed to interact with the Infrastructure Manager (IM) repository via the OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting) interface. It provides various commands to harvest metadata, including repository information, available identifiers, metadata formats, sets, and records.
- Retrieve repository identity information (
identify
). - List available metadata formats (
list_metadata_formats
). - List available identifiers (
list_identifiers
). - List available sets (
list_sets
). - Retrieve specific records (
get_record
). - List all available records (
list_records
).
- Python 3.x
lxml
library (for XML parsing)oaipmh_scythe
library (for interacting with the OAI-PMH interface)
You can install the required dependencies using pip
:
pip install -r requirements.txt
Clone this repository and navigate into the project directory:
git clone <repository_url>
cd <project_directory>
The script supports various commands through the command line interface (CLI). Below is a list of the available commands and their arguments:
-
identify
Retrieve repository identity information.python3 oaim_client.py --im-endpoint <im_endpoint> identify
-
list_metadata_formats
Retrieve all available metadata formats in the repository.python3 oaim_client.py --im-endpoint <im_endpoint> list_metadata_formats [identifier]
identifier
: Optional. Specify the record identifier to filter the metadata formats.
-
list_identifiers
Retrieve all available record identifiers in the repository.python3 oaim_client.py --im-endpoint <im_endpoint> list_identifiers <metadata_prefix> [--from <from_date>] [--until <until_date>] [--set <set_name>]
-
metadata_prefix
: Required. The metadata prefix (e.g., oai_dc). -
from_date
: Optional. Lower bound of datestamps (YYYY-MM-DD). -
until
: Optional. Upper bound of datestamps (YYYY-MM-DD). -
set_name
: Optional. Set of records to retrieve.
-
-
list_sets
Retrieve the set structure of the repository.python3 oaim_client.py --im-endpoint <im_endpoint> list_sets
-
get_record
Retrieve a specific record from the repository.python3 oaim_client.py --im-endpoint <im_endpoint> get_record <identifier> <metadata_prefix>
-
identifier
: Required. The identifier of the record to retrieve. -
metadata_prefix
: Required. The metadata prefix (e.g., oai_dc).
-
-
list_records
Retrieve all records available in the repository.python3 oaim_client.py --im-endpoint <im_endpoint> list_records <metadata_prefix> [--from <from_date>] [--until <until_date>] [--set <set_name>]
-
metadata_prefix
: Required. The metadata prefix (e.g., oai_dc). -
from_date
: Optional. Lower bound of datestamps (YYYY-MM-DD). -
until
: Optional. Upper bound of datestamps (YYYY-MM-DD). -
set_name
: Optional. Set of records to retrieve.
-
To retrieve repository identity information:
python3 oaim_client.py --im-endpoint <im_endpoint> identify
To list available metadata formats for a specific identifier:
python3 oaim_client.py --im-endpoint <im_endpoint> list_metadata_formats
To retrieve all records with metadata prefix oai_dc:
python3 oaim_client.py --im-endpoint <im_endpoint> list_records oai_dc --from 2020-01-21