The DBT SAP HANA Cloud Adapter allows seamless integration of dbt with SAP HANA Cloud, enabling data transformation, modeling, and orchestration.
python>=3.9,
dbt-core>=1.9.0,
dbt-adapters>=1.7.2,
dbt-common>=1.3.0
hdbcli>=2.22.32
-
The python virtual environment needs to activated before we begin the installation process
python3 -m venv <name>
enter the name for the virtual environment in the place holder.
-
Activate the virutal environment
source <name>/bin/activate
use the same name which you gave above.
Step 1. Install dbt-sap-hana-cloud adapter
- Clone the dbt-sap-hana-cloud repositroy
- Navigate to the cloned repository
cd /path/to/dbt-sap-hana-cloud
- for installation use the below command
pip3 install .
Step 2. Create a dbt project
- Initialize a New dbt Project in a different location
choose dbt-saphanacloud from the list and add the fields asked after the selection Step 3. Profile setup (in case the dbt init command fails)
dbt init
- Edit the $HOME/.dbt/profiles.yml file within the .dbt folder (create if it does not exist).
- Add the following configuration, replacing placeholders with your SAP HANA credentials:
my-sap-hana-cloud-profile:
target: dev
outputs:
dev:
type: saphanacloud
host: <host> # SAP HANA cloud host address
port: <port> # Port for SAP HANA cloud
user: <user> # SAP HANA cloud username
password: <password> # SAP HANA cloud password
database: <database> # Database to connect to
schema: <schema> # Schema to use
threads: <threads> # Number of threads you want to use
Step 6. Link Profile to dbt Project(in case the dbt init command fails)
- In your dbt_project.yml file (located in your dbt project folder), reference the profile name:
profile: my-sap-hana-cloud-profile
Step 7. Test Connection
- In the terminal, naviagte to you dbt project folder
cd /path/to/dbt-project
- Run the following command to ensure dbt can connect to SAP HANA Cloud:
dbt debug
Step 1. Navigate to the dbt-sap-hana-cloud repository
cd /path/to/dbt-sap-hana-cloud
Step 2. Install Development Requirements
- In the dbt-sap-hana-cloud folder, install the dev_requirements.txt:
pip3 install -r dev_requirements.txt
Step 3. Create a test.env File
-
In the same folder as the adapter, create a test.env file and add the following:
DBT_HANA_HOST=<host> # SAP HANA Cloud host address DBT_HANA_PORT=<port> # SAP HANA Cloud port DBT_HANA_USER=<user> # SAP HANA Cloud username DBT_HANA_PASSWORD=<password> # SAP HANA Cloud password DBT_HANA_DATABASE=<database> # Database to connect to DBT_HANA_SCHEMA=<schema> # Schema to use DBT_HANA_THREADS=<threads> # number of threads you want to use # Create 3 users in hana db with the same name as below to test grants DBT_TEST_USER_1= DBT_TEST_USER_1 DBT_TEST_USER_2= DBT_TEST_USER_2 DBT_TEST_USER_3= DBT_TEST_USER_3
Step 4. Test adapter functionality
- Run the following command to execute functional tests:
python3 -m pytest tests/functional/
- If you want to define the type of table created for an incremental model or a table model, you can do so by adding this configuration to the model inside the
config
block.there are two options available for the table type either 'row' or 'column'.table_type='row'
Note: The default type of table will be column if nothing is mentioned.
There are five types of indexes in the DBT SAP HANA Cloud adapter.
- Row table
- BTREE
- CPBTREE
- Column table
- INVERTED VALUE
- INVERTED HASH
- INVERTED INDIVIDUAL
Below are the example configuration to use them
- Row table
{{ config( materialized = "table", table_type='row', indexes=[ {'columns': ['column_a'], 'type': 'BTREE'}, {'columns': ['column_b'], 'type': 'BTREE', 'unique': True}, {'columns': ['column_a', 'column_b'], 'type': 'CPBTREE'}, {'columns': ['column_b', 'column_a'], 'type': 'CPBTREE', 'unique': True} ] ) }}
- Column table
{{ config( materialized = "table", table_type='column', indexes=[ {'columns': ['column_b'], 'type': 'INVERTED VALUE'}, {'columns': ['column_a', 'column_b'], 'type': 'INVERTED VALUE'}, {'columns': ['column_b', 'column_a'], 'type': 'INVERTED VALUE', 'unique': True}, {'columns': ['column_b', 'column_c'], 'type': 'INVERTED HASH', 'unique': True}, {'columns': ['column_a', 'column_c'], 'type': 'INVERTED INDIVIDUAL', 'unique': True} ] ) }}
You can now set unique keys as the primary key in an incremental and table model by simply enabling a flag. For example, you can configure it like this:
incremental model:
{{
config(
materialized = "incremental",
unique_key = ['id', 'name', 'county'],
unique_as_primary = true
)
}}
table model:
{{
config(
materialized = "table",
unique_key = ['id', 'name', 'county'],
unique_as_primary = true
)
}}
You can divide the transformation of an incremental model into multiple batches using the query_partitions
option. This wraps the SQL query in an outer query, which is then filtered based on the respective partition value.
Model definition
{{
config(
materialized="incremental",
unique_key=["ID"],
unique_as_primary=true,
query_partitions = [
{
'column':'CATEGORY',
'type':'list',
'partitions':['train','plane','car'],
'default_partition_required':False
}
],
)
}}
select 1 as ID, 'car' as CATEGORY from sys.dummy
union all
select 2 as ID, 'train' as CATEGORY from sys.dummy
union all
select 3 as ID, 'plane' as CATEGORY from sys.dummy
Executed query for batch 1 (CATEGORY = 'train'
)
select
*
from (
select 1 as ID, 'car' as CATEGORY from sys.dummy
union all
select 2 as ID, 'train' as CATEGORY from sys.dummy
union all
select 3 as ID, 'plane' as CATEGORY from sys.dummy
) t
where "CATEGORY" = 'train'
Executed query for batch 2 (CATEGORY = 'plane'
)
select
*
from (
select 1 as ID, 'car' as CATEGORY from sys.dummy
union all
select 2 as ID, 'train' as CATEGORY from sys.dummy
union all
select 3 as ID, 'plane' as CATEGORY from sys.dummy
) t
where "CATEGORY" = 'plane'
Executed query for batch 3 (CATEGORY = 'car'
)
select
*
from (
select 1 as ID, 'car' as CATEGORY from sys.dummy
union all
select 2 as ID, 'train' as CATEGORY from sys.dummy
union all
select 3 as ID, 'plane' as CATEGORY from sys.dummy
) t
where "CATEGORY" = 'car'
The query_partitions
configuration option expects a list of query_partitions
represented as objects. Each object has the following properties:
- column: The column after which the partitions (batches) are created.
- partitions: The definition of the partition values.
- type: The type of the partitions, which determines how the filter is applied. Possible variants:
list
: The value must match one of the partition values exactly (e.g.,CATEGORY = 'train'
,CATEGORY = 'car'
).range
: The partition values are sorted in ascending order. A value must be between two partition values (e.g.,CREATE_DATE >= '2023-01-01' AND CREATE_DATE < '2024-01-01'
,CREATE_DATE >= '2024-01-01' AND CREATE_DATE < '2025-01-01'
).
- default_partition_required: Defines if a default partition should be added for all rows that do not match any partition value. Possible values:
true
false
Note: Currently, a transformation can be partitioned after a maximum of two columns.
Some materializations are very complicated and cannot be executed using a standard dbt materializations. Using the sqlscript
materialization, it is possible to define custom logic with SQL Script.
Example:
{{
config(
materialized="sqlscript"
)
}}
DO BEGIN
-- Transformation written in sql script
END
dbt is intended for transforming data that already resides in a database (the "T" in an ELT process).
Since SAP HANA Cloud can access remote data using SQL (Smart Data Access (SDA) and Smart Data Integration (SDI)), dbt can also be used to extract and load data.
The saphanacloud
dbt adapter includes a macro that automatically creates virtual tables.
To use this feature, you need to add remote_database
and remote_schema
as source metadata. Additionally, include the metadata value virtual_table
with the boolean value true
. The name of the dbt source must match the name of the remote source in SAP HANA Cloud.
Example source definition:
version: 2
sources:
- name: CRM
schema: RAW_DATA
meta: {
virtual_table: true,
remote_database: 'NULL',
remote_schema: 'DEFAULT'
}
tables:
- name: CUSTOMERS
- name: SUPPLIERS
- name: PRODUCTS
identifier: VT_PRODUCTS
Then the following macro has be called:
dbt run-operation create_sources
This command checks if all required virtual tables exist and creates them if they do not. In the example, it will execute the following SQL statements:
CREATE VIRTUAL TABLE RAW_DATA.CUSTOMERS AT "CRM"."NULL"."DEFAULT"."CUSTOMERS";
CREATE VIRTUAL TABLE RAW_DATA.SUPPLIERS AT "CRM"."NULL"."DEFAULT"."SUPPLIERS";
CREATE VIRTUAL TABLE RAW_DATA.VT_PRODUCTS AT "CRM"."NULL"."DEFAULT"."PRODUCTS";
Note: If the name of the virtual table should be different from the name of the table in the remote source, you can use the
identifier
property of the table in the source definition.
No known issues
Create an issue in this repository if you find a bug or have questions about the content.
For additional support, ask a question in SAP Community.
If you wish to contribute code, offer fixes or improvements, please send a pull request. Due to legal reasons, contributors will be asked to accept a DCO when they create the first pull request to this project. This happens in an automated fashion during the submission process. SAP uses the standard DCO text of the Linux Foundation.
Copyright (c) 2024 SAP SE or an SAP affiliate company. All rights reserved. This project is licensed under the Apache Software License, version 2.0 except as noted otherwise in the LICENSE file.