Skip to content

Commit fa8a413

Browse files
Merge pull request #84 from odissei-data/development
Removed twente, trimbos and hsn from the deployment list.
2 parents fcdb58f + e3b6047 commit fa8a413

19 files changed

+775
-731
lines changed

.github/PULL_REQUEST_TEMPLATE.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
Fixes Jira issue ODSP-
2+
3+
# Description of changes
4+
5+
# How to test
6+
7+
# Related PRs
8+
9+
(Add links)
10+
11+
*
12+
13+
# Notify

Dockerfile.server

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
FROM python:3.11-slim
1+
FROM python:3.13.7-slim
22

33
ENV PYTHONPATH="${PYTHONPATH}:/app/scripts/" \
44
PYTHONDONTWRITEBYTECODE=1

Dockerfile.worker

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
FROM python:3.11-slim
1+
FROM python:3.13.7-slim
22

33
ENV PYTHONPATH="${PYTHONPATH}:/app/scripts/" \
44
PYTHONDONTWRITEBYTECODE=1

Makefile

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,9 @@ PROJECT_NAME = prefect_docker
55
PROJECT_SRV = ${PROJECT_NAME}
66
TARGET_URL ?=
77
TARGET_KEY ?=
8+
TARGET_BUCKET ?=
89
DO_HARVEST ?= True
10+
FULL_HARVEST ?= False
911

1012
.PHONY = help
1113
.DEFAULT:
@@ -61,6 +63,6 @@ submodules: ## Sets up the submodules and checks out their main branch.
6163
git submodule foreach git checkout main
6264
git submodule foreach git pull origin main
6365
ingest: ## Runs the ingest workflow for a specified data provider. The url and key of the target can be optionally added. eg: make ingest data_provider=CBS TARGET_URL=https://portal.example.odissei.nl TARGET_KEY=abcde123-11aa-22bb-3c4d-098765432abc
64-
@docker exec -it ${PROJECT_CONTAINER_NAME} python run_ingestion.py --data_provider=$(data_provider) --target_url=$(TARGET_URL) --target_key=$(TARGET_KEY) --do_harvest=$(DO_HARVEST)
66+
@docker exec -it ${PROJECT_CONTAINER_NAME} python run_ingestion.py --data_provider=$(data_provider) --target_url=$(TARGET_URL) --target_key=$(TARGET_KEY) --target_bucket=$(TARGET_BUCKET) --do_harvest=$(DO_HARVEST) --full_harvest=$(FULL_HARVEST)
6567
deploy: ## Deploys all ingestion workflows to the prefect server.
6668
@docker exec -it prefect python deployment/deploy_ingestion_pipelines.py
Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
import argparse
2+
import boto3
3+
4+
parser = argparse.ArgumentParser(description="Download an S3 bucket")
5+
parser.add_argument('--url', type=str, dest='boto_url', help='URL of the server')
6+
parser.add_argument('--username', type=str, dest='access_key', help='AWS access key')
7+
parser.add_argument('--password', type=str, dest='secret_key', help='AWS secret key')
8+
parser.add_argument('--bucket', type=str, dest='bucket_name', help='Name of the bucket')
9+
parser.add_argument('--directory', type=str, dest='download_directory', default=".", help='Directory to download files to')
10+
args = parser.parse_args()
11+
12+
s3_client = boto3.client(
13+
's3',
14+
endpoint_url=args.boto_url,
15+
aws_access_key_id=args.access_key,
16+
aws_secret_access_key=args.secret_key
17+
)
18+
19+
download_directory = args.download_directory
20+
21+
try:
22+
paginator = s3_client.get_paginator("list_objects_v2")
23+
pages = paginator.paginate(
24+
Bucket=args.bucket_name,
25+
)
26+
object_count = 0
27+
page_count = 0
28+
for page in pages:
29+
page_count += 1
30+
if 'Contents' in page:
31+
object_count += len(page['Contents'])
32+
print(f"Number of objects in page '{page_count}' of bucket '{args.bucket_name}': {len(page['Contents'])}")
33+
print(f"Start Downloading {len(page['Contents'])} objects from bucket '{args.bucket_name}' in page {page_count}:")
34+
for obj in page['Contents']:
35+
s3_client.download_file(Bucket=args.bucket_name, Key=obj['Key'], Filename=download_directory+'/'+obj['Key'])
36+
print(".", end="", flush=True) # most primitive progress indicator
37+
print() # for newline after progress dots
38+
print(f"Downloading page {page_count} done.")
39+
print(f"Total number of objects downloaded so far from '{args.bucket_name}': {object_count}")
40+
else:
41+
print(f"No objects found in '{args.bucket_name}'")
42+
if object_count > 0:
43+
print(f"Downloading done. Total number of objects downloaded from '{args.bucket_name}': {object_count}")
44+
# Note that downloading objects is done per page,
45+
# but the list of objects could be extracted first and then we could download them all in one loop.
46+
print(f"Error downloading objects from bucket '{args.bucket_name}': {e}")

docker-compose.yml

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -37,10 +37,7 @@ services:
3737
container_name: db
3838
volumes:
3939
- prefectdb:/var/lib/postgresql/data
40-
environment:
41-
- POSTGRES_USER=postgres
42-
- POSTGRES_PASSWORD=lolgres
43-
- POSTGRES_DB=prefect
40+
env_file: .env
4441
networks:
4542
- prefect
4643

notifications.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
Notifications
2+
=============
3+
4+
When workflows fail notifications should be sent (to a Slack channel) in order for the person(s)
5+
responsible for the service to check the logs and fix the problems.
6+
7+
8+
# Configure Slack
9+
10+
There is a Slack workplace created for the messages about the Odissei Ingest (workflows) with the name `odissei-ingest`.
11+
In this workspace there is a `prefect-notifications` channel that will be used for sending
12+
the notifications from the prefect workflow on failures.
13+
Using the Slack Api page (https://api.slack.com/apps/) a ‘Webhook App' application was created .
14+
With this application there is a 'Webhooks Features' option that allows to create new webhooks. The URL for this newly created webhook is then used for the configuration of Prefect, described in the next section.
15+
16+
The 'sample’ URL looks like this:
17+
```
18+
curl -X POST -H 'Content-type: application/json' --data '{"text":"Hello, World!"}' https://hooks.slack.com/services/<unique string here>
19+
```
20+
You can test this on the commandline, the message should appear in that Slack channel.
21+
22+
# Configure Prefect
23+
24+
Create a ‘Slack Webhook Block’ in the Prefect UI. That webhook URL from Slack is pasted into the the URL field of that block. The Prefect UI will give sample code to be placed inside your flow code in order to notify. This has been used in the code at the point where the 'bucket' with the failure information (failed dataset PIDs) is created. Noet that instead of using the hardcodede name of the block, you need to assign that name to the `PREFECT_SLACK_WEBHOOK_BLOCK` setting.
25+
The simplest way to test if it works is to force an error by using wrong API key. Then check the logs, the Slack channel and the 'bucket'.

0 commit comments

Comments
 (0)