Skip to content

Commit c75b1ea

Browse files
Merge branch 'issue_86' into issue_86_snapshot_silver
2 parents 06f5455 + db158c2 commit c75b1ea

37 files changed

+1935
-371
lines changed

.github/workflows/gh_pages.yml

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -9,21 +9,21 @@ defaults:
99
working-directory: ./docs
1010
jobs:
1111
deploy:
12-
runs-on: ubuntu-20.04
13-
concurrency:
14-
group: ${{ github.workflow }}-${{ github.ref }}
12+
runs-on:
13+
group: databrickslabs-protected-runner-group
14+
labels: linux-ubuntu-latest
1515
steps:
1616
- name: Checkout
17-
uses: actions/checkout@v2
17+
uses: actions/checkout@v4
1818
with:
1919
submodules: true # Fetch Hugo themes (true OR recursive)
2020
fetch-depth: 0 # Fetch all history for .GitInfo and .Lastmod
2121

2222

2323
- name: Setup Hugo
24-
uses: peaceiris/actions-hugo@v2
24+
uses: peaceiris/actions-hugo@v3
2525
with:
26-
hugo-version: '0.87.0'
26+
hugo-version: '0.143.0'
2727
extended: true
2828

2929
- name: Build
@@ -35,4 +35,4 @@ jobs:
3535
with:
3636
github_token: ${{ secrets.GITHUB_TOKEN }}
3737
publish_dir: ./docs/public
38-
publish_branch: public_docs_v1
38+
publish_branch: public_docs_v1

.github/workflows/onpush.yml

Lines changed: 23 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -33,10 +33,30 @@ jobs:
3333

3434
- name: Install coverage
3535
run: pip install coverage
36-
36+
37+
- name: Install psutil
38+
run: pip install psutil
39+
3740
- name: Lint
38-
run: flake8
41+
run: flake8
42+
43+
- name: set spark local
44+
run: export SPARK_LOCAL_IP=127.0.0.1
45+
46+
- name: set spark executor memory
47+
run: export SPARK_EXECUTOR_MEMORY=8g
48+
49+
- name: set spark driver memory
50+
run: export SPARK_DRIVER_MEMORY=8g
51+
52+
- name: set javaopts
53+
run: export JAVA_OPTS="-Xmx10g -XX:+UseG1GC"
3954

55+
- name: Print System Information
56+
run: |
57+
python -c "import psutil; import os;
58+
print(f'Physical Memory: {psutil.virtual_memory().total / 1e9:.2f} GB'); print(f'CPU Cores: {os.cpu_count()}')"
59+
4060
- name: Run Unit Tests
4161
run: python -m coverage run
4262

@@ -51,4 +71,4 @@ jobs:
5171
name: codecov-umbrella
5272
path_to_write_report: ./coverage/codecov_report.txt
5373
verbose: true
54-
74+

.gitignore

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -155,8 +155,7 @@ deployment-merged.yaml
155155
demo/conf/onboarding.json
156156
integration_tests/conf/onboarding*.json
157157
demo/conf/onboarding*.json
158-
databricks.yaml
159158
integration_test_output*.csv
160159
databricks.yml
160+
oboarding_job_details.json
161161

162-
.databricks

.gitmodules

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,3 +4,6 @@
44
[submodule "docs/themes/hugo-theme-learn"]
55
path = docs/themes/hugo-theme-learn
66
url = https://github.com/matcornic/hugo-theme-learn.git
7+
[submodule "docs/themes/hugo-theme-relearn"]
8+
path = docs/themes/hugo-theme-relearn
9+
url = https://github.com/McShelby/hugo-theme-relearn.git

CHANGELOG.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,22 @@
11
# Changelog
22
## [v.0.0.9]
3-
- Added dlt append_flow api support: [PR](https://github.com/databrickslabs/dlt-meta/pull/58)
3+
- Added apply_changes_from_snapshot api support in bronze layer: [PR](https://github.com/databrickslabs/dlt-meta/pull/124)
44
- Added dlt append_flow api support for silver layer: [PR](https://github.com/databrickslabs/dlt-meta/pull/63)
55
- Added support for file metadata columns for autoloader: [PR](https://github.com/databrickslabs/dlt-meta/pull/56)
66
- Added support for Bring your own custom transformation: [Issue](https://github.com/databrickslabs/dlt-meta/issues/68)
77
- Added support to Unify PyPI releases with GitHub OIDC: [PR](https://github.com/databrickslabs/dlt-meta/pull/62)
88
- Added demo for append_flow and file_metadata options: [PR](https://github.com/databrickslabs/dlt-meta/issues/74)
99
- Added Demo for silver fanout architecture: [PR](https://github.com/databrickslabs/dlt-meta/pull/83)
10-
- Added documentation in docs site for new features: [PR](https://github.com/databrickslabs/dlt-meta/pull/64)
10+
- Added hugo-theme-relearn themee: [PR](https://github.com/databrickslabs/dlt-meta/pull/132)
1111
- Added unit tests to showcase silver layer fanout examples: [PR](https://github.com/databrickslabs/dlt-meta/pull/67)
12+
- Added liquid cluster support: [PR](https://github.com/databrickslabs/dlt-meta/pull/136)
13+
- Added support for UC Volume + Serverless support for CLI, Integration tests and Demos: [PR](https://github.com/databrickslabs/dlt-meta/pull/105)
14+
- Added Chaining bronze/silver pipelines into single DLT: [PR](https://github.com/databrickslabs/dlt-meta/pull/130)
1215
- Fixed issue for No such file or directory: '/demo' :[PR](https://github.com/databrickslabs/dlt-meta/issues/59)
1316
- Fixed issue DLT-META CLI onboard command issue for Azure: databricks.sdk.errors.platform.ResourceAlreadyExists :[PR](https://github.com/databrickslabs/dlt-meta/issues/51)
1417
- Fixed issue Changed dbfs.create to mkdirs for CLI: [PR](https://github.com/databrickslabs/dlt-meta/pull/53)
1518
- Fixed issue DLT-META CLI should use pypi lib instead of whl : [PR](https://github.com/databrickslabs/dlt-meta/pull/79)
19+
- Fixed issue Onboarding with multiple partition columns errors out: [PR](https://github.com/databrickslabs/dlt-meta/pull/134)
1620

1721
## [v.0.0.7]
1822
- Added dlt-meta cli documentation and readme with browser support: [PR](https://github.com/databrickslabs/dlt-meta/pull/45)

README.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,21 @@ In practice, a single generic DLT pipeline reads the Dataflowspec and uses it to
6868

6969
![DLT-META Stages](./docs/static/images/dlt-meta_stages.png)
7070

71+
## DLT-META DLT Features support
72+
| Features | DLT-META Support |
73+
| ------------- | ------------- |
74+
| Input data sources | Autoloader, Delta, Eventhub, Kafka, snapshot |
75+
| Medallion architecture layers | Bronze, Silver |
76+
| Custom transformations | Bronze, Silver layer accepts custom functions|
77+
| Data Quality Expecations Support | Bronze, Silver layer |
78+
| Quarantine table support | Bronze layer |
79+
| [apply_changes](https://docs.databricks.com/en/delta-live-tables/python-ref.html#cdc) API support | Bronze, Silver layer |
80+
| [apply_changes_from_snapshot](https://docs.databricks.com/en/delta-live-tables/python-ref.html#change-data-capture-from-database-snapshots-with-python-in-delta-live-tables) API support | Bronze layer|
81+
| [append_flow](https://docs.databricks.com/en/delta-live-tables/flows.html#use-append-flow-to-write-to-a-streaming-table-from-multiple-source-streams) API support | Bronze layer|
82+
| Liquid cluster support | Bronze, Bronze Quarantine, Silver tables|
83+
| [DLT-META CLI](https://databrickslabs.github.io/dlt-meta/getting_started/dltmeta_cli/) | ```databricks labs dlt-meta onboard```, ```databricks labs dlt-meta deploy``` |
84+
| Bronze and Silver pipeline chaining | Deploy dlt-meta pipeline with ```layer=bronze_silver``` option using Direct publishing mode |
85+
7186
## Getting Started
7287

7388
Refer to the [Getting Started](https://databrickslabs.github.io/dlt-meta/getting_started)

demo/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -198,7 +198,7 @@ This demo will perform following tasks:
198198
199199
6. Run the command
200200
```commandline
201-
python demo/launch_silver_fanout_demo.py --source=cloudfiles --uc_catalog_name=<<uc catalog name>>
201+
python demo/launch_silver_fanout_demo.py --source=cloudfiles --uc_catalog_name=<<uc catalog name>> --profile=<<DEFAULT>>
202202
```
203203
204204
- you can provide `--profile=databricks_profile name` in case you already have databricks cli otherwise command prompt will ask host and token.

demo/conf/onboarding.template

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@
1818
"cloudFiles.rescuedDataColumn": "_rescued_data",
1919
"header": "true"
2020
},
21+
"bronze_cluster_by":["customer_id"],
2122
"bronze_data_quality_expectations_json_prod": "{uc_volume_path}/demo/conf/dqe/customers.json",
2223
"bronze_database_quarantine_prod": "{uc_catalog_name}.{bronze_schema}",
2324
"bronze_quarantine_table": "customers_quarantine",
@@ -38,6 +39,7 @@
3839
"_rescued_data"
3940
]
4041
},
42+
"silver_cluster_by":["customer_id"],
4143
"silver_transformation_json_prod": "{uc_volume_path}/demo/conf/silver_transformations.json",
4244
"silver_data_quality_expectations_json_prod": "{uc_volume_path}/demo/conf/dqe/customers_silver_dqe.json"
4345

demo/launch_dais_demo.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,6 @@
22
import traceback
33
from databricks.sdk.service import jobs, compute
44
from src.install import WorkspaceInstaller
5-
from src.__about__ import __version__
65
from integration_tests.run_integration_tests import (
76
DLTMETARunner,
87
DLTMetaRunnerConf,
@@ -98,7 +97,7 @@ def create_daisdemo_workflow(self, runner_conf: DLTMetaRunnerConf):
9897
jobs.JobEnvironment(
9998
environment_key="dl_meta_int_env",
10099
spec=compute.Environment(
101-
client=f"dlt_meta_int_test_{__version__}",
100+
client="1",
102101
dependencies=[runner_conf.remote_whl_path],
103102
),
104103
)

demo/launch_silver_fanout_demo.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,6 @@
33
import traceback
44
from databricks.sdk.service import jobs, compute
55
from src.install import WorkspaceInstaller
6-
from src.__about__ import __version__
76
from integration_tests.run_integration_tests import (
87
DLTMETARunner,
98
DLTMetaRunnerConf,
@@ -105,7 +104,7 @@ def create_sfo_workflow_spec(self, runner_conf: DLTMetaRunnerConf):
105104
jobs.JobEnvironment(
106105
environment_key="dl_meta_int_env",
107106
spec=compute.Environment(
108-
client=f"dlt_meta_int_test_{__version__}",
107+
client="1",
109108
dependencies=[runner_conf.remote_whl_path],
110109
),
111110
)

demo/launch_techsummit_demo.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,6 @@
2727
from databricks.sdk.service import jobs, compute
2828
from dataclasses import dataclass
2929
from src.install import WorkspaceInstaller
30-
from src.__about__ import __version__
3130
from integration_tests.run_integration_tests import (
3231
DLTMETARunner,
3332
DLTMetaRunnerConf,
@@ -164,7 +163,7 @@ def create_techsummit_demo_workflow(self, runner_conf: TechsummitRunnerConf):
164163
jobs.JobEnvironment(
165164
environment_key="dl_meta_int_env",
166165
spec=compute.Environment(
167-
client=f"dlt_meta_int_test_{__version__}",
166+
client="1",
168167
dependencies=[runner_conf.remote_whl_path],
169168
),
170169
)

docs/config.toml

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,15 @@
11
baseURL = 'https://databrickslabs.github.io/dlt-meta/'
22
languageCode = 'en-us'
33
title = 'DLT-META'
4-
theme= "hugo-theme-learn"
4+
theme= "hugo-theme-relearn"
55
pluralizeListTitles = false
66
canonifyURLs = true
7+
description = "DLT-META Documentation"
78

89
[params]
9-
description = "DLT-META Documentation"
10-
author = "Ravi Gawai (Databricks)"
10+
themeVariant = 'learn'
11+
themeVariantAuto = 'learn'
1112
disableShortcutsTitle = true
12-
disableLandingPageButton = false
13+
disableLandingPageButton = false
14+
author.name = "Ravi Gawai (Databricks)"
15+
disableNextPrev = false

docs/content/_index.md

Lines changed: 15 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,9 @@
22
title: "DLT-META"
33
date: 2021-08-04T14:50:11-04:00
44
draft: false
5+
56
---
67

7-
# DLT-META
88

99
## Project Overview
1010
DLT-META is a metadata-driven framework designed to work with Databricks Delta Live Tables (DLT). This framework enables the automation of bronze and silver data pipelines by leveraging metadata recorded in an onboarding JSON file. This file, known as the Dataflowspec, serves as the data flow specification, detailing the source and target metadata required for the pipelines.
@@ -40,7 +40,20 @@ In practice, a single generic DLT pipeline reads the Dataflowspec and uses it to
4040
- Option#1: [DLT-META CLI](https://databrickslabs.github.io/dlt-meta/getting_started/dltmeta_cli/#dataflow-dlt-pipeline)
4141
- Option#2: [DLT-META MANUAL](https://databrickslabs.github.io/dlt-meta/getting_started/dltmeta_manual/#dataflow-dlt-pipeline)
4242

43-
43+
## DLT-META DLT Features support
44+
| Features | DLT-META Support |
45+
| ------------- | ------------- |
46+
| Input data sources | Autoloader, Delta, Eventhub, Kafka, snapshot |
47+
| Medallion architecture layers | Bronze, Silver |
48+
| Custom transformations | Bronze, Silver layer accepts custom functions|
49+
| Data Quality Expecations Support | Bronze, Silver layer |
50+
| Quarantine table support | Bronze layer |
51+
| [apply_changes](https://docs.databricks.com/en/delta-live-tables/python-ref.html#cdc) API support | Bronze, Silver layer |
52+
| [apply_changes_from_snapshot](https://docs.databricks.com/en/delta-live-tables/python-ref.html#change-data-capture-from-database-snapshots-with-python-in-delta-live-tables) API support | Bronze layer|
53+
| [append_flow](https://docs.databricks.com/en/delta-live-tables/flows.html#use-append-flow-to-write-to-a-streaming-table-from-multiple-source-streams) API support | Bronze layer|
54+
| Liquid cluster support | Bronze, Bronze Quarantine, Silver tables|
55+
| [DLT-META CLI](https://databrickslabs.github.io/dlt-meta/getting_started/dltmeta_cli/) | ```databricks labs dlt-meta onboard```, ```databricks labs dlt-meta deploy``` |
56+
| Bronze and Silver pipeline chaining | Deploy dlt-meta pipeline with ```layer=bronze_silver``` option using Direct publishing mode |
4457
## How much does it cost ?
4558
DLT-META does not have any **direct cost** associated with it other than the cost to run the Databricks Delta Live Tables
4659
on your environment.The overall cost will be determined primarily by the [Databricks Delta Live Tables Pricing] (https://databricks.com/product/delta-live-tables-pricing-azure)

docs/content/contributing/onboarding/_index.md

Lines changed: 96 additions & 0 deletions
Large diffs are not rendered by default.

docs/content/demo/Append_FLOW_CF.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ This demo will perform following tasks:
3939
```
4040
4141
6. ```commandline
42-
python demo/launch_af_cloudfiles_demo.py --cloud_provider_name=aws --dbr_version=15.3.x-scala2.12 --dbfs_path=dbfs:/tmp/DLT-META/demo/ --uc_catalog_name=ravi_dlt_meta_uc
42+
python demo/launch_af_cloudfiles_demo.py --cloud_provider_name=aws --dbr_version=15.3.x-scala2.12 --dbfs_path=dbfs:/tmp/DLT-META/demo/ --uc_catalog_name=dlt_meta_uc
4343
```
4444
4545
- cloud_provider_name : aws or azure or gcp

docs/content/demo/Append_FLOW_EH.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ draft: false
5151
- Following are the mandatory arguments for running EventHubs demo
5252
- cloud_provider_name: Cloud provider name e.g. aws or azure
5353
- dbr_version: Databricks Runtime Version e.g. 15.3.x-scala2.12
54-
- uc_catalog_name : unity catalog name e.g. ravi_dlt_meta_uc
54+
- uc_catalog_name : unity catalog name e.g. dlt_meta_uc
5555
- dbfs_path: Path on your Databricks workspace where demo will be copied for launching DLT-META Pipelines e.g. dbfs:/tmp/DLT-META/demo/
5656
- eventhub_namespace: Eventhub namespace e.g. dltmeta
5757
- eventhub_name : Primary Eventhubname e.g. dltmeta_demo
@@ -62,7 +62,7 @@ draft: false
6262
- eventhub_port: Eventhub port
6363
6464
7. ```commandline
65-
python demo/launch_af_eventhub_demo.py --cloud_provider_name=aws --uc_catalog_name=ravi_dlt_meta_uc --eventhub_name=dltmeta_demo --eventhub_name_append_flow=dltmeta_demo_af --eventhub_secrets_scope_name=dltmeta_eventhub_creds --eventhub_namespace=dltmeta --eventhub_port=9093 --eventhub_producer_accesskey_name=RootManageSharedAccessKey --eventhub_consumer_accesskey_name=RootManageSharedAccessKey --eventhub_accesskey_secret_name=RootManageSharedAccessKey --uc_catalog_name=ravi_dlt_meta_uc
65+
python demo/launch_af_eventhub_demo.py --cloud_provider_name=aws --uc_catalog_name=dlt_meta_uc --eventhub_name=dltmeta_demo --eventhub_name_append_flow=dltmeta_demo_af --eventhub_secrets_scope_name=dltmeta_eventhub_creds --eventhub_namespace=dltmeta --eventhub_port=9093 --eventhub_producer_accesskey_name=RootManageSharedAccessKey --eventhub_consumer_accesskey_name=RootManageSharedAccessKey --eventhub_accesskey_secret_name=RootManageSharedAccessKey
6666
```
6767
6868
![af_eh_demo.png](/images/af_eh_demo.png)

0 commit comments

Comments
 (0)