Skip to content

Commit 97cb74a

Browse files
fhennigsbernauer
andauthored
Visualize ocean floor geo data in Superset (with Trino, from S3) (#88)
* WIP * some fixes * Added geometry column * WIP * WIP * fix link * rename some k8s objects and change reference to reference github raw files * Update stacks/trino-superset-s3/superset.yaml Co-authored-by: Sebastian Bernauer <sebastian.bernauer@stackable.de> * changed location and name of dataset * Update description * Don't document for now * change branch refs * Update demos/trino-subsea-data/setup-superset.yaml Co-authored-by: Sebastian Bernauer <sebastian.bernauer@stackable.de> --------- Co-authored-by: Sebastian Bernauer <sebastian.bernauer@stackable.de>
1 parent 3f39aaa commit 97cb74a

File tree

6 files changed

+223
-0
lines changed

6 files changed

+223
-0
lines changed

demos/demos-v2.yaml

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -147,6 +147,25 @@ demos:
147147
cpu: 6800m
148148
memory: 15822Mi
149149
pvc: 28Gi
150+
trino-subsea-data:
151+
description: Demo loading ca. 600m^2 of ocean floor in a surface plot to visualize the irregularities of the ocean floor.
152+
# documentation: -- Currently not documented
153+
stackableStack: trino-superset-s3
154+
labels:
155+
- trino
156+
- superset
157+
- minio
158+
- s3
159+
- parquet
160+
manifests:
161+
- plainYaml: https://raw.githubusercontent.com/stackabletech/demos/main/demos/trino-subsea-data/load-test-data.yaml
162+
- plainYaml: https://raw.githubusercontent.com/stackabletech/demos/main/demos/trino-subsea-data/create-table-in-trino.yaml
163+
- plainYaml: https://raw.githubusercontent.com/stackabletech/demos/main/demos/trino-subsea-data/setup-superset.yaml
164+
supportedNamespaces: []
165+
resourceRequests:
166+
cpu: 6800m
167+
memory: 15822Mi
168+
pvc: 28Gi
150169
data-lakehouse-iceberg-trino-spark:
151170
description: Data lakehouse using Iceberg lakehouse on S3, Trino as query engine, Spark for streaming ingest and Superset for data visualization. Multiple datasources like taxi data, water levels in Germany, earthquakes, e-charging stations and more are loaded.
152171
documentation: https://docs.stackable.tech/stackablectl/stable/demos/data-lakehouse-iceberg-trino-spark.html
Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
---
2+
apiVersion: batch/v1
3+
kind: Job
4+
metadata:
5+
name: create-subsea-multibeam-table-in-trino
6+
spec:
7+
template:
8+
spec:
9+
containers:
10+
- name: create-subsea-multibeam-table-in-trino
11+
image: docker.stackable.tech/stackable/testing-tools:0.2.0-stackable24.7.0
12+
command: ["bash", "-c", "python -u /tmp/script/script.py"]
13+
volumeMounts:
14+
- name: script
15+
mountPath: /tmp/script
16+
- name: trino-users
17+
mountPath: /trino-users
18+
volumes:
19+
- name: script
20+
configMap:
21+
name: create-subsea-multibeam-table-in-trino-script
22+
- name: trino-users
23+
secret:
24+
secretName: trino-users
25+
restartPolicy: OnFailure
26+
backoffLimit: 50
27+
---
28+
apiVersion: v1
29+
kind: ConfigMap
30+
metadata:
31+
name: create-subsea-multibeam-table-in-trino-script
32+
data:
33+
script.py: |
34+
import sys
35+
import trino
36+
37+
if not sys.warnoptions:
38+
import warnings
39+
warnings.simplefilter("ignore")
40+
41+
def get_connection():
42+
connection = trino.dbapi.connect(
43+
host="trino-coordinator",
44+
port=8443,
45+
user="admin",
46+
http_scheme='https',
47+
auth=trino.auth.BasicAuthentication("admin", open("/trino-users/admin").read()),
48+
)
49+
connection._http_session.verify = False
50+
return connection
51+
52+
def run_query(connection, query):
53+
print(f"[DEBUG] Executing query {query}")
54+
cursor = connection.cursor()
55+
cursor.execute(query)
56+
return cursor.fetchall()
57+
58+
connection = get_connection()
59+
60+
run_query(connection, "CREATE SCHEMA IF NOT EXISTS hive.demo WITH (location = 's3a://demo/')")
61+
run_query(connection, """
62+
CREATE TABLE IF NOT EXISTS hive.demo.subsea (
63+
footprint_x DOUBLE,
64+
footprint_y DOUBLE,
65+
water_depth DOUBLE,
66+
data_point_density DOUBLE,
67+
geometry VARBINARY
68+
) WITH (
69+
external_location = 's3a://demo/subsea/',
70+
format = 'parquet'
71+
)
72+
""")
73+
74+
loaded_rows = run_query(connection, "SELECT COUNT(*) FROM hive.demo.subsea")[0][0]
75+
print(f"Loaded {loaded_rows} rows")
76+
assert loaded_rows > 0
77+
78+
print("Analyzing table subsea")
79+
analyze_rows = run_query(connection, """ANALYZE hive.demo.subsea""")[0][0]
80+
assert analyze_rows == loaded_rows
81+
stats = run_query(connection, """show stats for hive.demo.subsea""")
82+
print("Produced the following stats:")
83+
print(*stats, sep="\n")
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
---
2+
apiVersion: batch/v1
3+
kind: Job
4+
metadata:
5+
name: load-subsea-multibeam-data
6+
spec:
7+
template:
8+
spec:
9+
containers:
10+
- name: load-subsea-multibeam-data
11+
image: "bitnami/minio:2024-debian-12"
12+
command: ["bash", "-c", "cd /tmp && curl -O https://repo.stackable.tech/repository/misc/marispace/multibeam_data_point_density_example.parquet && mc --insecure alias set minio http://minio:9000/ $(cat /minio-s3-credentials/accessKey) $(cat /minio-s3-credentials/secretKey) && mc cp multibeam_data_point_density_example.parquet minio/demo/subsea"]
13+
volumeMounts:
14+
- name: minio-s3-credentials
15+
mountPath: /minio-s3-credentials
16+
volumes:
17+
- name: minio-s3-credentials
18+
secret:
19+
secretName: minio-s3-credentials
20+
restartPolicy: OnFailure
21+
backoffLimit: 50
Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
---
2+
apiVersion: batch/v1
3+
kind: Job
4+
metadata:
5+
name: setup-superset
6+
spec:
7+
template:
8+
spec:
9+
containers:
10+
- name: setup-superset
11+
image: docker.stackable.tech/stackable/testing-tools:0.2.0-stackable24.7.0
12+
command: ["bash", "-c", "curl -o superset-assets.zip https://raw.githubusercontent.com/stackabletech/demos/main/demos/trino-subsea-data/superset-assets.zip && python -u /tmp/script/script.py"]
13+
volumeMounts:
14+
- name: script
15+
mountPath: /tmp/script
16+
- name: trino-users
17+
mountPath: /trino-users
18+
- name: superset-credentials
19+
mountPath: /superset-credentials
20+
volumes:
21+
- name: script
22+
configMap:
23+
name: setup-superset-script
24+
- name: superset-credentials
25+
secret:
26+
secretName: superset-credentials
27+
- name: trino-users
28+
secret:
29+
secretName: trino-users
30+
restartPolicy: OnFailure
31+
backoffLimit: 50
32+
---
33+
apiVersion: v1
34+
kind: ConfigMap
35+
metadata:
36+
name: setup-superset-script
37+
data:
38+
script.py: |
39+
import logging
40+
import requests
41+
42+
base_url = "http://superset-external:8088" # For local testing / developing replace it, afterwards change back to http://superset-external:8088
43+
superset_username = open("/superset-credentials/adminUser.username").read()
44+
superset_password = open("/superset-credentials/adminUser.password").read()
45+
trino_username = "admin"
46+
trino_password = open("/trino-users/admin").read()
47+
48+
logging.basicConfig(level=logging.INFO)
49+
logging.info("Starting setup of Superset")
50+
51+
logging.info("Getting access token from /api/v1/security/login")
52+
session = requests.session()
53+
access_token = session.post(f"{base_url}/api/v1/security/login", json={"username": superset_username, "password": superset_password, "provider": "db", "refresh": True}).json()['access_token']
54+
# print(f"access_token: {access_token}")
55+
56+
logging.info("Getting csrf token from /api/v1/security/csrf_token")
57+
csrf_token = session.get(f"{base_url}/api/v1/security/csrf_token", headers={"Authorization": f"Bearer {access_token}"}).json()["result"]
58+
# print(f"csrf_token: {csrf_token}")
59+
60+
headers = {
61+
"accept": "application/json",
62+
"Authorization": f"Bearer {access_token}",
63+
"X-CSRFToken": csrf_token,
64+
}
65+
66+
# To retrieve all of the assets (datasources, datasets, charts and dashboards) run the following commands
67+
# logging.info("Exporting all assets")
68+
# result = session.get(f"{base_url}/api/v1/assets/export", headers=headers)
69+
# assert result.status_code == 200
70+
# with open("superset-assets.zip", "wb") as f:
71+
# f.write(result.content)
72+
73+
74+
#########################
75+
# IMPORTANT
76+
#########################
77+
# The exported zip file had to be modified, otherwise we get:
78+
# <Response [422]>
79+
# {"errors": [{"message": "Error importing assets", "error_type": "GENERIC_COMMAND_ERROR", "level": "warning", "extra": {"databases/Trino.yaml": {"extra": {"disable_data_preview": ["Unknown field."]}}, "issue_codes": [{"code": 1010, "message": "Issue 1010 - Superset encountered an error while running a command."}]}}]}
80+
#
81+
# The file databases/Trino.yaml was modified and the attribute "extra.disable_data_preview" was removed
82+
#########################
83+
logging.info("Importing all assets")
84+
files = {
85+
"bundle": ("superset-assets.zip", open("superset-assets.zip", "rb")),
86+
}
87+
data = {
88+
"passwords": '{"databases/Trino.yaml": "' + trino_password + '"}'
89+
}
90+
result = session.post(f"{base_url}/api/v1/assets/import", headers=headers, files=files, data=data)
91+
print(result)
92+
print(result.text)
93+
assert result.status_code == 200
94+
95+
logging.info("Finished setup of Superset")
6.39 KB
Binary file not shown.

stacks/trino-superset-s3/superset.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,11 @@ spec:
1414
roleGroups:
1515
default:
1616
replicas: 1
17+
configOverrides:
18+
superset_config.py:
19+
# Needed by trino-subsea-data demo
20+
ROW_LIMIT: "200000"
21+
SQL_MAX_ROW: "200000"
1722
---
1823
apiVersion: v1
1924
kind: Secret

0 commit comments

Comments
 (0)