DBT SAP HANA Cloud adapter

Description

The DBT SAP HANA Cloud Adapter allows seamless integration of dbt with SAP HANA Cloud, enabling data transformation, modeling, and orchestration.

Requirements

python>=3.9,
dbt-core>=1.9.0,
dbt-adapters>=1.7.2,
dbt-common>=1.3.0
hdbcli>=2.22.32

Python virtual environment

The python virtual environment needs to activated before we begin the installation process
```
python3 -m venv <name>
```
enter the name for the virtual environment in the place holder.
Activate the virutal environment
```
source <name>/bin/activate
```
use the same name which you gave above.

Download and Installation

Step 1. Install dbt-sap-hana-cloud adapter

Clone the dbt-sap-hana-cloud repositroy
Navigate to the cloned repository

cd /path/to/dbt-sap-hana-cloud
for installation use the below command
```
pip3 install .
```

Step 2. Create a dbt project

Initialize a New dbt Project in a different location
```
dbt init
```
choose dbt-saphanacloud from the list and add the fields asked after the selection Step 3. Profile setup (in case the dbt init command fails)

Edit the $HOME/.dbt/profiles.yml file within the .dbt folder (create if it does not exist).
Add the following configuration, replacing placeholders with your SAP HANA credentials:

Sample profile

my-sap-hana-cloud-profile:
  target: dev
  outputs:
    dev:
      type: saphanacloud
      host: <host>       # SAP HANA cloud host address
      port: <port>       # Port for SAP HANA cloud
      user: <user>       # SAP HANA cloud username
      password: <password> # SAP HANA cloud password
      database: <database> # Database to connect to
      schema: <schema>   # Schema to use
      threads: <threads> # Number of threads you want to use

Step 6. Link Profile to dbt Project(in case the dbt init command fails)

In your dbt_project.yml file (located in your dbt project folder), reference the profile name:
```
profile: my-sap-hana-cloud-profile
```

Step 7. Test Connection

In the terminal, naviagte to you dbt project folder
```
cd /path/to/dbt-project
```
Run the following command to ensure dbt can connect to SAP HANA Cloud:
```
dbt debug
```

Test cases for adapter

Step 1. Navigate to the dbt-sap-hana-cloud repository

cd /path/to/dbt-sap-hana-cloud

Step 2. Install Development Requirements

In the dbt-sap-hana-cloud folder, install the dev_requirements.txt:
```
pip3 install -r dev_requirements.txt
```

Step 3. Create a test.env File

In the same folder as the adapter, create a test.env file and add the following:

DBT_HANA_HOST=<host>       # SAP HANA Cloud host address
DBT_HANA_PORT=<port>       # SAP HANA Cloud port
DBT_HANA_USER=<user>       # SAP HANA Cloud username
DBT_HANA_PASSWORD=<password> # SAP HANA Cloud password
DBT_HANA_DATABASE=<database> # Database to connect to
DBT_HANA_SCHEMA=<schema>   # Schema to use
DBT_HANA_THREADS=<threads> # number of threads you want to use
# Create 3 users in hana db with the same name as below to test grants
DBT_TEST_USER_1= DBT_TEST_USER_1
DBT_TEST_USER_2= DBT_TEST_USER_2
DBT_TEST_USER_3= DBT_TEST_USER_3

Step 4. Test adapter functionality

Run the following command to execute functional tests:
```
python3 -m pytest tests/functional/
```

DBT SAP HANA Cloud specific configuration

Table Type

If you want to define the type of table created for an incremental model or a table model, you can do so by adding this configuration to the model inside the config block.
```
table_type='row'
```
there are two options available for the table type either 'row' or 'column'.

Note: The default type of table will be column if nothing is mentioned.

Index

There are five types of indexes in the DBT SAP HANA Cloud adapter.

Row table
- BTREE
- CPBTREE
Column table
- INVERTED VALUE
- INVERTED HASH
- INVERTED INDIVIDUAL

Below are the example configuration to use them

Row table

{{
  config(
    materialized = "table",
    table_type='row',
    indexes=[
      {'columns': ['column_a'], 'type': 'BTREE'},
      {'columns': ['column_b'], 'type': 'BTREE', 'unique': True},
      {'columns': ['column_a', 'column_b'], 'type': 'CPBTREE'},
      {'columns': ['column_b', 'column_a'], 'type': 'CPBTREE', 'unique': True}
    ]
  )
}}

Column table

{{
  config(
    materialized = "table",
    table_type='column',
    indexes=[
      {'columns': ['column_b'], 'type': 'INVERTED VALUE'},
      {'columns': ['column_a', 'column_b'], 'type': 'INVERTED VALUE'},
      {'columns': ['column_b', 'column_a'], 'type': 'INVERTED VALUE', 'unique': True},
      {'columns': ['column_b', 'column_c'], 'type': 'INVERTED HASH', 'unique': True},
      {'columns': ['column_a', 'column_c'], 'type': 'INVERTED INDIVIDUAL', 'unique': True}
    ]
  )
}}

Unique Keys as Primary key

You can now set unique keys as the primary key in an incremental and table model by simply enabling a flag. For example, you can configure it like this:

incremental model:

{{
  config(
    materialized = "incremental",
    unique_key = ['id', 'name', 'county'],
    unique_as_primary = true
  )
}}

table model:

{{
  config(
    materialized = "table",
    unique_key = ['id', 'name', 'county'],
    unique_as_primary = true
  )
}}

Query partitions in incremental models

You can divide the transformation of an incremental model into multiple batches using the query_partitions option. This wraps the SQL query in an outer query, which is then filtered based on the respective partition value.

Example

Model definition

{{
    config(
        materialized="incremental",
        unique_key=["ID"],
        unique_as_primary=true,
        query_partitions = [
          {
                  'column':'CATEGORY',
                  'type':'list',
                  'partitions':['train','plane','car'],
                  'default_partition_required':False
          }
        ],

    )
}}

  select 1 as ID, 'car' as CATEGORY  from sys.dummy
  union all
  select 2 as ID, 'train' as CATEGORY  from sys.dummy
  union all
  select 3 as ID, 'plane' as CATEGORY  from sys.dummy

Executed query for batch 1 (CATEGORY = 'train')

select 
    *
from (

    select 1 as ID, 'car' as CATEGORY  from sys.dummy
    union all
    select 2 as ID, 'train' as CATEGORY  from sys.dummy
    union all
    select 3 as ID, 'plane' as CATEGORY  from sys.dummy

) t

where "CATEGORY" = 'train'

Executed query for batch 2 (CATEGORY = 'plane')

select 
    *
from (

    select 1 as ID, 'car' as CATEGORY  from sys.dummy
    union all
    select 2 as ID, 'train' as CATEGORY  from sys.dummy
    union all
    select 3 as ID, 'plane' as CATEGORY  from sys.dummy

) t

where "CATEGORY" = 'plane'

Executed query for batch 3 (CATEGORY = 'car')

select 
    *
from (

    select 1 as ID, 'car' as CATEGORY  from sys.dummy
    union all
    select 2 as ID, 'train' as CATEGORY  from sys.dummy
    union all
    select 3 as ID, 'plane' as CATEGORY  from sys.dummy

) t

where "CATEGORY" = 'car'

Configuration options

The query_partitions configuration option expects a list of query_partitions represented as objects. Each object has the following properties:

column: The column after which the partitions (batches) are created.
partitions: The definition of the partition values.
type: The type of the partitions, which determines how the filter is applied. Possible variants:
- list: The value must match one of the partition values exactly (e.g., CATEGORY = 'train', CATEGORY = 'car').
- range: The partition values are sorted in ascending order. A value must be between two partition values (e.g., CREATE_DATE >= '2023-01-01' AND CREATE_DATE < '2024-01-01', CREATE_DATE >= '2024-01-01' AND CREATE_DATE < '2025-01-01').
default_partition_required: Defines if a default partition should be added for all rows that do not match any partition value. Possible values:
- true
- false

Note: Currently, a transformation can be partitioned after a maximum of two columns.

Custom sqlscript materialization

Some materializations are very complicated and cannot be executed using a standard dbt materializations. Using the sqlscript materialization, it is possible to define custom logic with SQL Script.

Example:

{{
    config(
      materialized="sqlscript"
    )
}}

DO BEGIN

  -- Transformation written in sql script

END

Automatic creation of virtual tables

dbt is intended for transforming data that already resides in a database (the "T" in an ELT process).

Since SAP HANA Cloud can access remote data using SQL (Smart Data Access (SDA) and Smart Data Integration (SDI)), dbt can also be used to extract and load data.

The saphanacloud dbt adapter includes a macro that automatically creates virtual tables.

To use this feature, you need to add remote_database and remote_schema as source metadata. Additionally, include the metadata value virtual_table with the boolean value true. The name of the dbt source must match the name of the remote source in SAP HANA Cloud.

Example source definition:

version: 2

sources:
  - name: CRM
    schema: RAW_DATA
    meta: {
      virtual_table: true,
      remote_database: 'NULL',
      remote_schema: 'DEFAULT'
    }
    tables:
      - name: CUSTOMERS
      - name: SUPPLIERS
      - name: PRODUCTS
        identifier: VT_PRODUCTS

Then the following macro has be called:

dbt run-operation create_sources

This command checks if all required virtual tables exist and creates them if they do not. In the example, it will execute the following SQL statements:

CREATE VIRTUAL TABLE RAW_DATA.CUSTOMERS AT "CRM"."NULL"."DEFAULT"."CUSTOMERS";
CREATE VIRTUAL TABLE RAW_DATA.SUPPLIERS AT "CRM"."NULL"."DEFAULT"."SUPPLIERS";
CREATE VIRTUAL TABLE RAW_DATA.VT_PRODUCTS AT "CRM"."NULL"."DEFAULT"."PRODUCTS";

Note: If the name of the virtual table should be different from the name of the table in the remote source, you can use the identifier property of the table in the source definition.

Known Issues

No known issues

How to obtain support

Create an issue in this repository if you find a bug or have questions about the content.

For additional support, ask a question in SAP Community.

Contributing

If you wish to contribute code, offer fixes or improvements, please send a pull request. Due to legal reasons, contributors will be asked to accept a DCO when they create the first pull request to this project. This happens in an automated fashion during the submission process. SAP uses the standard DCO text of the Linux Foundation.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
LICENSES		LICENSES
dbt		dbt
dbt_sap_hana_cloud.egg-info		dbt_sap_hana_cloud.egg-info
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
REUSE.toml		REUSE.toml
dev_requirements.txt		dev_requirements.txt
mypy.ini		mypy.ini
pytest.ini		pytest.ini
requirements.txt		requirements.txt
sample_profiles.yml		sample_profiles.yml
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DBT SAP HANA Cloud adapter

Description

Requirements

Python virtual environment

Download and Installation

Sample profile

Test cases for adapter

DBT SAP HANA Cloud specific configuration

Table Type

Index

Unique Keys as Primary key

Query partitions in incremental models

Example

Configuration options

Custom sqlscript materialization

Automatic creation of virtual tables

Known Issues

How to obtain support

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

License

SAP-samples/dbt-sap-hana-cloud

Folders and files

Latest commit

History

Repository files navigation

DBT SAP HANA Cloud adapter

Description

Requirements

Python virtual environment

Download and Installation

Sample profile

Test cases for adapter

DBT SAP HANA Cloud specific configuration

Table Type

Index

Unique Keys as Primary key

Query partitions in incremental models

Example

Configuration options

Custom sqlscript materialization

Automatic creation of virtual tables

Known Issues

How to obtain support

Contributing

License

About

Topics

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages