A FastAPI application for harvesting metadata from external sources and storing it in an AWS S3 bucket.
- 
Clone the repository: git clone https://github.com/odissei-data/odissei-harvester.git 
This FastAPI application allows you to initiate harvesting processes for metadata from external sources and store it in an AWS S3 bucket. It provides various API endpoints to start and monitor the harvesting processes.
The following environment variables need to be set for proper functioning:
- AWS_SECRET_ACCESS_KEY: Your AWS secret access key.
- AWS_ACCESS_KEY_ID: Your AWS access key ID.
- S3_STORAGE_ENDPOINT: The endpoint URL for your S3 storage.
- LISS_ENDPOINT_URL: The URL of the LISS API for metadata harvesting.
- LISS_ENDPOINT_USERNAME: Your LISS API username.
- LISS_ENDPOINT_KEY: Your LISS API key.
Method: GET
Description: Get the version of the FastAPI Harvester.
Input: None
Output: JSON containing the version of the FastAPI Harvester.
Method: GET
Description: Get the status of a specific harvesting process.
Input: harvest_id - The unique ID of the harvesting process.
Output: JSON containing the details of the new harvesting process, including its ID, status, start time, end time, and failed files.
Method: POST
Description: Initiate a background LISS metadata harvesting process.
Input: JSON body containing LISSRequest data, specifying the bucket name for storage.
Output: JSON containing the details of the new harvesting process, including its ID, status, start time, end time, and failed files.
Method: POST
Description: Initiate a LISS metadata harvesting process.
Input: JSON body containing LISSRequest data, specifying the bucket name for storage.
Output: JSON containing the details of the new harvesting process, including its ID, status, start time, and more.
Method: POST
Description: Initiate a background metadata harvesting process.
Input: JSON body containing HarvestRequest data, specifying metadata prefix, OAI endpoint, bucket name, and more.
Output: JSON containing the details of the new harvesting process, including its ID, status, start time, and more.
Method: POST
Description: Initiate a metadata harvesting process.
Input: JSON body containing HarvestRequest data, specifying metadata prefix, OAI endpoint, bucket name, and more.
Output: JSON containing the details of the new harvesting process, including its ID, status, start time, and more.
Background tasks are used to perform metadata harvesting asynchronously. They are employed to ensure efficient processing and handling of large amounts of data.
To run the FastAPI Harvester using Docker, follow these steps:
- 
Make sure you have Docker installed on your system. 
- 
Create a .envfile in the root directory of the project with the environment variables mentioned above.
- 
Build and start the containers using Docker Compose: docker-compose up --build This will build the Docker image and start the FastAPI application along with a PostgreSQL database container. 
- 
Access the FastAPI application at http://localhost:7890.
This project is licensed under the Apache license.